I want to insert 1 million of rows in my database with this code, but only insert 1000 and I don't know why.
I have 2 csv files with 1000 rows like this:
Katherina,Rasmus,82-965-3140,29/09/1962,krasmus8thetimescouk
import psycopg2
import csv
print("\n")
csv_file1=open('/home/oscarg/Downloads/base de datos/archivo1.csv', "r")
csv_file2=open('/home/oscarg/Downloads/base de datos/archivo2.csv', "r")
try:
connection = psycopg2.connect(user = "oscar",
password = "",
host = "127.0.0.1",
port = "5432",
database = "challenge6_7")
cursor = connection.cursor()
csv_reader1 = csv.reader(csv_file1, delimiter=',')
for row in csv_reader1:
csv_reader2 = csv.reader(csv_file2, delimiter=',')
contador=+1
for row2 in csv_reader2:
nombre=row[0]+" "+row2[0]
apellido=row[1]+" "+row2[1]
cedula_id=row[2]+row2[2]
if not(contador%1000):
fecha_nacimiento="'"+row[3]+"'"
else:
fecha_nacimiento="'"+row2[3]+"'"
if not (contador%3):
email=row[4]+"#hotmail.com"
else:
email=row2[4]+"#gmail.com"
postgres_insert_query = " INSERT INTO cliente (nombre, apellido, cedula_id,fecha_nacimiento, cliente_email) VALUES (%s,%s, %s, %s,%s)"
record_to_insert = (nombre, apellido, cedula_id, fecha_nacimiento, email)
cursor.execute(postgres_insert_query, record_to_insert)
connection.commit()
if (contador==1000):
contador=0
except (Exception, psycopg2.Error) as error :
print(error.pgerror)
finally:
#closing database connection.
if(connection):
cursor.close()
connection.close()
print("PostgreSQL connection is closed")
csv_file1.close()
csv_file2.close()
Insert 1000 rows and then stop, it's a problem with my code, psycopg or my database?
It is possible that the reader pointer expires (End of File) for the second iteration of the second csv file, so nothing is being read.
You might want to store the rows in a list first, then iterate over them.
See: Python import csv to list
Edit: This is the issue. I made a little test myself.
import csv
csv_file1=open("a.csv", "r")
csv_file2=open("1.csv", "r")
csv_reader1 = csv.reader(csv_file1, delimiter=',')
for row in csv_reader1:
csv_file2=open("1.csv", "r") # Removing this line makes the code run N times
# Instead of N x N (a million in your example.)
csv_reader2 = csv.reader(csv_file2, delimiter=',')
for row2 in csv_reader2:
print(row, row2)
I tested it by opening the file (not the reader), in the first loop. However opening the file again and again does not seem like the best practice. You should store it in a list if you don't have memory limitations.
Related
import csv
import psycopg2
conn = psycopg2.connect(database=" ", user=" ", password=" ", host=" ", port= )
cur = conn.cursor()
with open('21.csv', 'r') as f:
next(f)
cur.copy_from(f, 'temp_questions', sep=',')
conn.commit()
i have try to insert data into my db i got error:
cur.copy_from(f, 'temp_questions', sep=',')
psycopg2.errors.QueryCanceled: COPY from stdin failed: error in .read() call: exceptions.ValueError Mixing iteration and read methods would lose data
CONTEXT: COPY temp_questions, line 1
in my csv file -i have 18 column and
table(database)- id with 18 column
i don't know how to insert data
import csv
db=conn.connect('test.db')
print("connected succesfully")
csv_file="test.csv"
with open(csv_file,'r') as csv_file:
csvreader=csv.reader(csv_file)
fields=next(csvreader)
sql_insert_query='INSERT INTO Test (name,age) VALUES(?,?)'
db.executemany(sql_insert_query, csvreader)
print("inserted")
data=db.execute("SELECT * FROM Test")
for i in data:
print(i)
Read the data from csv file and use executemany to insert an array of elements to the database.
I'm new to Python, and my task is to import the csv to mysql database. I have this sample values inside my csv file:
SHA1,VSDT,TRX
005c41fc0f8580f51644493fcbaa0d2d468312c3,(WIN32 (EXE 7-2)),Ransom.Win32.TRX.XXPE50FFF027,
006ea7ce2768fa208ec7dfbf948bffda9da09e4e,WIN32 EXE 2-2,TROJ.Win32.TRX.XXPE50FFF027,
My problem here is, how can I remove "( " and ")" only at the start and end point of string at the second column before importing to database?
I have this code to import the csv
import csv
import mysql.connector
file = open(fullPath, 'rb')
csv_data = csv.reader(file)
mycursor = mydb.cursor()
cursor = mydb.cursor()
for row in csv_data:
cursor.execute('INSERT INTO jeremy_table_test(sha1,vsdt,trendx)'
'VALUES(%s, %s, %s)',[(row[0]),(row[1]),(row[2]))
mydb.commit()
cursor.close()
print("Done")
Skip the row when you read it in, rather than when you write it.
with open(fullPath, 'rb') as file:
csv_data = csv.reader(file)
next(csv_data)
mycursor = mydb.cursor()
cursor = mydb.cursor()
for row in csv_data:
cursor.execute('INSERT INTO jeremy_table_test(sha1,vsdt,trendx)'
'VALUES(%s, %s, %s)',[(row[0]),(row[1]),(row[2]))
mydb.commit()
cursor.close()
print("Done")
MySQL LOAD DATA tool can probably do what you want here. Here is what the LOAD DATA call might look like:
LOAD DATA INFILE 'path/to/rb'
INTO TABLE jeremy_table_test
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\r\n' -- or '\n'
IGNORE 1 LINES
(sha1, #var1, trendx)
SET vsdt = TRIM(TRAILING ')' FROM TRIM(LEADING '(' FROM #var1));
To make this call from your Python code, you may try something like this:
query = "LOAD DATA INFILE 'path/to/rb' INTO TABLE jeremy_table_test FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n' IGNORE 1 LINES (sha1, #var1, trendx) SET vsdt = TRIM(TRAILING ')' FROM TRIM(LEADING '(' FROM #var1))"
cursor.execute(query)
connection.commit()
I have postgres tables and i want to run a PostgreSQL script file on these tables using python and then write the result of the queries in a csv file. The script file have multiple queries separated by semicolon ;. Sample script is shown below
Script file:
--Duplication Check
select p.*, c.name
from scale_polygons_v3 c inner join cartographic_v3 p
on (metaphone(c.name_displ, 20) LIKE metaphone(p.name, 20)) AND c.kind NOT IN (9,10)
where ST_Contains(c.geom, p.geom);
--Area Check
select sp.areaid,sp.name_displ,p.road_id,p.name
from scale_polygons_v3 sp, pak_roads_20162207 p
where st_contains(sp.geom,p.geom) and sp.kind = 1
and p.areaid != sp.areaid;
When i run the python code, it executes successfully without any error but the problem i am facing is, during writing the result of the queries to a csv file. Only the result of last executed query is written to the csv file. It means that first query result is overwrite by the second query, second by third and so on till the last query.
Here is my python code:
import psycopg2
import sys
import csv
import datetime, time
def run_sql_file(filename, connection):
'''
The function takes a filename and a connection as input
and will run the SQL query on the given connection
'''
start = time.time()
file = open(filename, 'r')
sql = s = " ".join(file.readlines())
#sql = sql1[3:]
print "Start executing: " + " at " + str(datetime.datetime.now().strftime("%Y-%m-%d %H:%M")) + "\n"
print "Query:\n", sql + "\n"
cursor = connection.cursor()
cursor.execute(sql)
records = cursor.fetchall()
with open('Report.csv', 'a') as f:
writer = csv.writer(f, delimiter=',')
for row in records:
writer.writerow(row)
connection.commit()
end = time.time()
row_count = sum(1 for row in records)
print "Done Executing:", filename
print "Number of rows returned:", row_count
print "Time elapsed to run the query:",str((end - start)*1000) + ' ms'
print "\t ==============================="
def main():
connection = psycopg2.connect("host='localhost' dbname='central' user='postgres' password='tpltrakker'")
run_sql_file("script.sql", connection)
connection.close()
if __name__ == "__main__":
main()
What is wrong with my code?
If you are able to change the SQL script a bit then here is a workaround:
#!/usr/bin/env python
import psycopg2
script = '''
declare cur1 cursor for
select * from (values(1,2),(3,4)) as t(x,y);
declare cur2 cursor for
select 'a','b','c';
'''
print script
conn = psycopg2.connect('');
# Cursors exists and available only inside the transaction
conn.autocommit = False;
# Create cursors from script
conn.cursor().execute(script);
# Read names of cursors
cursors = conn.cursor();
cursors.execute('select name from pg_cursors;')
cur_names = cursors.fetchall()
# Read data from each available cursor
for cname in cur_names:
print cname[0]
cur = conn.cursor()
cur.execute('fetch all from ' + cname[0])
rows = cur.fetchall()
# Here you can save the data to the file
print rows
conn.rollback()
print 'done'
Disclaimer: I am totally newbie with Python.
This is the simplest to output each query as a different file. copy_expert
query = '''
select p.*, c.name
from
scale_polygons_v3 c
inner join
cartographic_v3 p on metaphone(c.name_displ, 20) LIKE metaphone(p.name, 20) and c.kind not in (9,10)
where ST_Contains(c.geom, p.geom)
'''
copy = "copy ({}) to stdout (format csv)".format(query)
f = open('Report.csv', 'wb')
cursor.copy_expert(copy, f, size=8192)
f.close()
query = '''
select sp.areaid,sp.name_displ,p.road_id,p.name
from scale_polygons_v3 sp, pak_roads_20162207 p
where st_contains(sp.geom,p.geom) and sp.kind = 1 and p.areaid != sp.areaid;
'''
copy = "copy ({}) to stdout (format csv)".format(query)
f = open('Report2.csv', 'wb')
cursor.copy_expert(copy, f, size=8192)
f.close()
If you want to append the second output to the same file then just keep the first file object opened.
Notice that it is necessary that copy outputs to stdout to make it available to copy_expert
here is what I try to achieve my current code is working fine I get the query to run on my sql server but I will need to gather information from several servers. How would I add a column with the dbserver listed in that column?
import pyodbc
import csv
f = open("dblist.ini")
dbserver,UID,PWD = [ variable[variable.find("=")+1 :] for variable in f.readline().split("~")]
connectstring = "DRIVER={SQL server};SERVER=" + dbserver + ";DATABASE=master;UID="+UID+";PWD="+PWD
cnxn = pyodbc.connect(connectstring)
cursor = cnxn.cursor()
fd = open('mssql1.txt', 'r')
sqlFile = fd.read()
fd.close()
cursor.execute(sqlFile)
with open("out.csv", "wb") as csv_file:
csv_writer = csv.writer(csv_file, delimiter = '!')
csv_writer.writerow([i[0] for i in cursor.description]) # write headers
csv_writer.writerows(cursor)
You could add the extra information in your sql query. For example:
select "dbServerName", * from table;
Your cursor will return with an extra column in front of your real data that has the db Server name. The downside to this method is you're transferring a little more extra data.
I use psycopg2 to connect postgresql and python, and here's my script,
import sys
#set up psycopg2 environment
import psycopg2
#driving_distance module
query = """
select *
from driving_distance ($$
select
gid as id,
start_id::int4 as source,
end_id::int4 as target,
shape_leng::double precision as cost
from network
$$, %s, %s, %s, %s
)
"""
#make connection between python and postgresql
conn = psycopg2.connect("dbname = 'TC_routing' user = 'postgres' host = 'localhost' password = '****'")
cur = conn.cursor()
#count rows in the table
cur.execute("select count(*) from network")
result = cur.fetchone()
k = result[0] + 1
#run loops
rs = []
i = 1
while i <= k:
cur.execute(query, (i, 1000000, False, False))
rs.append(cur.fetchall())
i = i + 1
h = 0
ars = []
element = list(rs)
while h <= 15:
rp = element[0][h][2]
ars.append(rp)
h = h + 1
print ars
conn.close()
the output is fine,
[0.0, 11810.7956476379, 16018.6818979217, 18192.3576530232, 21507.7366792666, 25819.1955059578, 26331.2523709618, 49447.0908955008, 28807.7871013087, 39670.8579371438, 42723.0239515299, 38719.7320396044, 38265.4435766971, 40744.8813155033, 43770.2158657742, 46224.8748774639]
but if I add some lines below in order to export results to the csv file, I got this error,
import csv
with open('test.csv', 'wb') as f:
writer = csv.writer(f, delimiter = ',')
for row in ars:
writer.writerow(row)
[0.0, 11810.7956476379, 16018.6818979217, 18192.3576530232, 21507.7366792666, 25819.1955059578,
26331.2523709618, 49447.0908955008, 28807.7871013087, 39670.8579371438, 42723.0239515299, 38719.7320396044, 38265.4435766971, 40744.8813155033, 43770.2158657742, 46224.8748774639]
Traceback (most recent call last):
File "C:/Users/Heinz/Desktop/python_test/distMatrix_test.py", line 54, in <module>
writer.writerow(row)
Error: sequence expected
How to fix this?
I am working with python 2.7.6 and pyscripter under Windows 8.1 x64. Feel free to give me any suggestion, thanks a lot!
import csv
with open('test.csv', 'wb') as f:
writer = csv.writer(f, delimiter = ',')
for row in ars:
writer.writerow(row)
ars is just a single list. So your for loop does not extract a row from ars. It takes an element from the ars list and tries to write it as a row.
Try replacing it with
for row in ars:
writer.writerow([row])
This will write each element as a row in csv file.
or if u want to have a single row as in output then just dont use for loop ,instead use
writer.writerow(ars)