I have a python script that reads a large (4GB!!!) CSV file into MySQL. It works as is, but is DOG slow. The CSV file has over 4 million rows. And it is taking forever to insert all the records into the database.
Could I get an example of how I would use executemany in this situation?
Here is my code:
source = os.path.join('source_files', 'aws_bills', 'march-bill-original-2019.csv')
try:
with open(source) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
next(csv_reader)
insert_sql = """ INSERT INTO billing_info (InvoiceId, PayerAccountId, LinkedAccountId, RecordType, RecordId, ProductName, RateId, SubscriptionId, PricingPlanId, UsageType, Operation, AvailabilityZone, ReservedInstance, ItemDescription, UsageStartDate, UsageEndDate, UsageQuantity, BlendedRate, BlendedCost, UnBlendedRate, UnBlendedCost, ResourceId, Engagement, Name, Owner, Parent) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s) """
#for row in csv_reader:
for row_idx, row in enumerate(csv_reader):
try:
cursor.execute(insert_sql,row)
#cursor.executemany(insert_sql, 100)
mydb.commit()
print('row', row_idx, 'inserted with LinkedAccountId', row[2], 'at', datetime.now().isoformat())
except Exception as e:
print("MySQL Exception:", e)
print("Done importing data.")
Again, that code works to insert the records into the database. But I am hoping to speed this up with executemany if I can get an example of how to do that.
Good Night
I saw that the question is a little old and I don't know if you still need it.
I was doing something similar recently, initially I transformed the csv into a list so that the executemany function accepts the data, right after performing the request passing its insert with the list, in your case it would look like this:
import pandas as pd
df = pd.read_csv(r'path_your_csv')
df1=pd.DataFrame(df)
df1=df1.astype(str)
List_Values=df1.values.tolist()
insert_sql = """ INSERT INTO billing_info (InvoiceId, PayerAccountId, LinkedAccountId, RecordType, RecordId, ProductName, RateId, SubscriptionId, PricingPlanId, UsageType, Operation, AvailabilityZone, ReservedInstance, ItemDescription, UsageStartDate, UsageEndDate, UsageQuantity, BlendedRate, BlendedCost, UnBlendedRate, UnBlendedCost, ResourceId, Engagement, Name, Owner, Parent) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s) """
cursor.executemany(insert_sql, List_Values)
Can anyone help, I have no idea why it keeps returing the error
ERROR: Not enough arguments for format string
It's reading from a csv where the headers are named Property ID, Reference Number etc. The only difference is the addition of the _ in the table column names.
Here is my script for reference:
import csv
import pymysql
mydb = pymysql.connect(host='127.0.0.1', user='root', passwd='root', db='jupix', unix_socket="/Applications/MAMP/tmp/mysql/mysql.sock")
cursor = mydb.cursor()
with open("activeproperties.csv") as f:
reader = csv.reader(f)
# next(reader) # skip header
data = []
for row in reader:
cursor.executemany('INSERT INTO ACTIVE_PROPERTIES(Property_ID, Reference_Number,Address_Name,Address_Number,Address_Street,Address_2,Address_3,Address_4,Address_Postcode,Owner_Contact_ID,Owner_Name,Owner_Number_Type_1,Owner_Contact_Number_1,Owner_Number_Type_2,Owner_Contact_Number_2,Owner_Number_Type_3,Owner_Contact_Number_3,Owner_Email_Address,Display_Property_Type,Property_Type,Property_Style,Property_Bedrooms,Property_Bathrooms,Property_Ensuites,Property_Toilets,Property_Reception_Rooms,Property_Kitchens,Floor_Area_Sq_Ft,Acres,Rent,Rent_Frequency,Furnished,Next_Available_Date,Property_Status,Office_Name,Negotiator,Date_Created)''VALUES(%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)', row)
mydb.commit()
cursor.close()
print"Imported!"
The error is happening because you have 37 columns that you are trying to insert data into, but 38 inputs that you are sending to the database i.e %s. Therefore you are telling the cursor to send a piece of data to the database but the cursor does not know where to insert the data into. You either forgot to include a column in your INSERT INTO statement, or have an extra %s in your statement.
Therefore you need to remove one of the %s in your SQL statement, or add a column in your database to send the last piece of data into.
When you have a large number of columns, it can be a challenge to make sure you have one %s for each column. There's an alternative syntax for INSERT that makes this easier.
Instead of this:
INSERT INTO ACTIVE_PROPERTIES(Property_ID, Reference_Number,
Address_Name, Address_Number, Address_Street, Address_2, Address_3,
Address_4, Address_Postcode, Owner_Contact_ID, Owner_Name,
Owner_Number_Type_1, Owner_Contact_Number_1, Owner_Number_Type_2,
Owner_Contact_Number_2, Owner_Number_Type_3, Owner_Contact_Number_3,
Owner_Email_Address, Display_Property_Type, Property_Type,
Property_Style, Property_Bedrooms, Property_Bathrooms, Property_Ensuites,
Property_Toilets, Property_Reception_Rooms, Property_Kitchens,
Floor_Area_Sq_Ft, Acres, Rent, Rent_Frequency, Furnished,
Next_Available_Date, Property_Status, Office_Name, Negotiator,
Date_Created)
VALUES(%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s,
%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s,
%s, %s, %s, %s, %s)
Try the following, to make it easier to match up columns with %s parameters, so you don't miscount:
INSERT INTO ACTIVE_PROPERTIES
SET Property_ID = %s,
Reference_Number = %s,
Address_Name = %s,
Address_Number = %s,
Address_Street = %s,
Address_2 = %s,
Address_3 = %s,
Address_4 = %s,
Address_Postcode = %s,
Owner_Contact_ID = %s,
Owner_Name = %s,
Owner_Number_Type_1 = %s,
Owner_Contact_Number_1 = %s,
Owner_Number_Type_2 = %s,
Owner_Contact_Number_2 = %s,
Owner_Number_Type_3 = %s,
Owner_Contact_Number_3 = %s,
Owner_Email_Address = %s,
Display_Property_Type = %s,
Property_Type = %s,
Property_Style = %s,
Property_Bedrooms = %s,
Property_Bathrooms = %s,
Property_Ensuites = %s,
Property_Toilets = %s,
Property_Reception_Rooms = %s,
Property_Kitchens = %s,
Floor_Area_Sq_Ft = %s,
Acres = %s,
Rent = %s,
Rent_Frequency = %s,
Furnished = %s,
Next_Available_Date = %s,
Property_Status = %s,
Office_Name = %s,
Negotiator = %s,
Date_Created = %s;
It's not standard SQL, but it's supported by MySQL. It does the same thing internally, it's just more convenient syntax, at least when you're inserting a single row at a time.
a simple example. also you can manipulate it for your needs.
myDict = {'key-1': 193699, 'key-2': 206050, 'key-3': 0, 'key-N': 9999999}
values = ', '.join(f"'{str(x)}'" for x in myDict.values())
columns = ', '.join(myDict.keys())
sql = f"INSERT INTO YourTableName ({columns}) VALUES ({values});"
conn = pymysql.connect(autocommit=True, host="localhost",database='DbName', user='UserName', password='PassWord',)
with conn.cursor() as cursor:
cursor.execute(sql)
Getting error as below:
con.commit()
^ SyntaxError: invalid syntax
Here is the code segment for JSON file to MYSQL using python
try:
cursor.execute("""INSERT INTO vehicle (CarYear,make,model,cylinders,VClass,drive,trany,displ,eng_dscr,trans_dscr,mpgData,evMotor,
youSaveSpend,fuleType,fuleType1,barrelsA08,charge120,charge240,city08,city08U,cityA08,cityA08U,cityCD,cityE,cityUF,co2,coA2,co2TailpipeAGpm,
co2TailpipeGpm,comb08,comb08U,combA08,combA08U,combE,combinedCD,combinedUF,engld,feScore,fuelCost08,fuelCostA08,ghgScore,ghgScoreA,highway08,
highway08U,highwayA08,highwayA08U,highwayCD,highwayE,highwayUF,hlv,hpv,id,lv2,lv4,phevBlended,pv2,pv4,CarRange,rangeCity,rangeCityA,rangeHwy,
rangeHwyA,UCity,UCityA,UHighway,UHighwayA,guzzler,tCharger,sCharger,atvType,fuelType2,rangeA,mfrCode,c240Dscr,charge240b,c240bDscr,createdOn,
modifiedOn,startStop,phevCity,phevHwy,phevComb)
VALUES (%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,
%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s,%s,%s, %s, %s,%s, %s,
%s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s),
(CarYear, make, model, cylinders,VClass, drive, trany, displ, eng_dscr, trans_dscr,mpgData,evMotor,youSaveSpend,fuleType,fuleType1,
barrelsA08,charge120,charge240,city08,city08U,cityA08,cityA08U,cityCD,cityE,cityUF,co2,coA2,co2TailpipeAGpm,co2TailpipeGpm,comb08,
comb08U,combA08,combA08U,combE,combinedCD,combinedUF,engld,feScore,fuelCost08,fuelCostA08,ghgScore,ghgScoreA,highway08,highway08U,
highwayA08,highwayA08U,highwayCD,highwayE,highwayUF,hlv,hpv,id,lv2,lv4,phevBlended,pv2,pv4,CarRange,rangeCity,rangeCityA,rangeHwy,
rangeHwyA,UCity,UCityA,UHighway,UHighwayA,guzzler,tCharger,sCharger,atvType,fuelType2,rangeA,mfrCode,c240Dscr,charge240b,c240bDscr,
createdOn,modifiedOn,startStop,phevCity,phevHwy,phevComb))"""
``
con.commit()
except pymysql.Error as e:
raise
sys.exit(1)
finally:
if con:
con.close()
NEED HELP Please.. just writing here because of policy
you need to split the SQL string: """ INSERT ... """, the values, something like:
"""INSERT INTO vehicle (CarYear,make,model,cylinders,VClass,drive,trany,displ,eng_dscr,trans_dscr,mpgData,evMotor,
youSaveSpend,fuleType,fuleType1,barrelsA08,charge120,charge240,city08,city08U,cityA08,cityA08U,cityCD,cityE,cityUF,co2,coA2,co2TailpipeAGpm,
co2TailpipeGpm,comb08,comb08U,combA08,combA08U,combE,combinedCD,combinedUF,engld,feScore,fuelCost08,fuelCostA08,ghgScore,ghgScoreA,highway08,
highway08U,highwayA08,highwayA08U,highwayCD,highwayE,highwayUF,hlv,hpv,id,lv2,lv4,phevBlended,pv2,pv4,CarRange,rangeCity,rangeCityA,rangeHwy,
rangeHwyA,UCity,UCityA,UHighway,UHighwayA,guzzler,tCharger,sCharger,atvType,fuelType2,rangeA,mfrCode,c240Dscr,charge240b,c240bDscr,createdOn,
modifiedOn,startStop,phevCity,phevHwy,phevComb)
VALUES (%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,
%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s,%s,%s, %s, %s,%s, %s,
%s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s, %s, %s,%s)""",
(CarYear, make, model, cylinders,VClass, drive, trany, displ, eng_dscr, trans_dscr,mpgData,evMotor,youSaveSpend,fuleType,fuleType1,
barrelsA08,charge120,charge240,city08,city08U,cityA08,cityA08U,cityCD,cityE,cityUF,co2,coA2,co2TailpipeAGpm,co2TailpipeGpm,comb08,
comb08U,combA08,combA08U,combE,combinedCD,combinedUF,engld,feScore,fuelCost08,fuelCostA08,ghgScore,ghgScoreA,highway08,highway08U,
highwayA08,highwayA08U,highwayCD,highwayE,highwayUF,hlv,hpv,id,lv2,lv4,phevBlended,pv2,pv4,CarRange,rangeCity,rangeCityA,rangeHwy,
rangeHwyA,UCity,UCityA,UHighway,UHighwayA,guzzler,tCharger,sCharger,atvType,fuelType2,rangeA,mfrCode,c240Dscr,charge240b,c240bDscr,
createdOn,modifiedOn,startStop,phevCity,phevHwy,phevComb)
otherwise the whole thing is just a string