Getting String Index Out of Range When Inserting to Postgresql table - python

I'm new to python, programming in general, and trying to read a .dat file and insert the data into a postgres table.
I'm getting an error and I've googled but could not come up with a resolution. Hoping someone can point me to the right direction.
Ratings table:
UserID int
MovieID int
Rating float
Ratings.dat:
1::122::5::838985046
1::185::5::838983525
Below is my code:
import psycopg2
ratingsfile = open('ml-10M100K/ratings.dat', 'r')
for line in ratingsfile:
items = line.split('::')
for values in items:
curr.execute("INSERT INTO Ratings(UserID, MovieID, Rating)
VALUES (%s, %s, %s)", values)
conn.commit()
ratingsfile.close()
Error:
curr.execute("INSERT INTO Ratings(UserID, MovieID, Rating)
VALUES (%s, %s, %s)", values)
IndexError: string index out of range

You do not need to iterate through items. Instead you can assign the 3 %s variables as items[index] as follows:
import psycopg2
ratingsfile = open('ml-10M100K/ratings.dat', 'r')
for line in ratingsfile:
items = line.split('::')
curr.execute("INSERT INTO Ratings(UserID, MovieID, Rating) VALUES (%s, %s, %s)" % (items[0], items[1], items[2]))
conn.commit()
ratingsfile.close()
This assumes that, for example in your example Ratings.dat, that UserID is 1 (items[0]), MovieID is 122 and 185 (items[1]), and Rating is 5 (items[2]). The 9 digit integers at the end of each row can be accessed with items[3]

Related

How to Upsert a pandas dataframe to postgres using INSERT ON CONFLICT DO UPDATE SET EXCLUDED query giving error

I have a pandas data frame, and I want to insert/update (upsert) it to a table. Condition is-
There will be an insert of new rows (this scenario will add current timestamp to the column INSERT_TIMESTAMP while inserting to the table)
This scenario will keep UPDATE_TIMESTAMP & PREVIOUS_UPDATE_TIMESTAMP blank.
For the existing rows(calling the row as existing if the primary key exists in the table), there will be an update of the values into existing row, except INSERT_TIMESTAMP value.
And this scenario will add current timestamp to column UPDATE_TIMESTAMP while updating.
Also it should copy into PREVIOUS_UPDATE_TIMESTAMP value with the UPDATE_TIMESTAMP value before update.
Here is my code where I am trying to upsert a dataframe into a table that is already created.
CODE is the primary key.
Again as mentioned, I want to write a code to Insert the row if the CODE is not present in the table, and adding INSERT_TIMESTAMP as current. And if the CODE is existent in the table already, then update the row in table excluding INSERT_TIMESTAMP, but add current time to UPDATE_TIMESTAMP in that case. Also copy UPDATE_TIMESTAMP for the row before UPDATE and add it to PREVIOUS_UPDATE_TIMESTAMP.
This code is giving me error for list tuple out of index.
for index, row in dataframe.iterrows():
print(row)
cur.execute("""INSERT INTO TABLE_NAME
(CODE, NAME, CODE_GROUP,
INDICATOR, INSERT_TIMESTAMP,
UPDATE_SOURCE, IDD, INSERT_SOURCE)
VALUES (%s, %s, %s, %s, NOW(), %s, %s, %s)
ON CONFLICT(CODE)
DO UPDATE SET
NAME = %s,
CODE_GROUP = %s,
INDICATOR = %s,
UPDATE_TIMESTAMP = NOW(),
UPDATE_SOURCE = %s,
IDD = %s, INSERT_SOURCE = %s,
PREV_UPDATE_TIMESTAMP = EXCLUDED.UPDATE_TIMESTAMP""",
(row["CODE"],
row['NAME'],
row['CODE_GROUP'],
row['INDICATOR'],
row['UPDATE_SOURCE'],
row['IDD'],
row['INSERT_SOURCE']))
conn.commit()
cur.close()
conn.close()
Please help me where is it going wrong. Should I add all the columns into UPDATE statement which I have mentioned in the INSERT statement? Because in insert statement, UPDATE_TIMESTAMP & PREVIOUS_UPDATE_TIMESTAMP should be null. And in UPDATE query, INSERT_TIMESTAMP has to be the same as it was before.

Error importing CSV With Python to MYSQL - TypeError: not enough arguments for format string

I'm facing a problem trying to insert a CSV file with 800 records in the database. This gives me the following error: MySQLdb.ProgrammingError: not enough arguments for format string. I already checked and the exchanges and variables are correct, could you help me what would this problem be? Here is the code:
import MySQLdb
import csv
conn = MySQLdb.connect(host="127.0.0.1", user="root", password="", database="csm")
cursor = conn.cursor()
csv_data = csv.reader(open('teste2.csv'))
header = next(csv_data)
for row in csv_data:
print(row)
cursor.execute(
"INSERT INTO estoque1 (release, official, order, date, product, client, sales, sales, quant) VALUES (%s, %s, %s, %s, %s ,%s ,%s ,%s ,%s)", row)
conn.commit()
cursor.close()
I'm facing this error but idk how to solve this. Anyone have some tips about this how can i solve?
Follow the image:
Because you're passing the array as an argument while the execute() function expects all the elements of the array as a single argument.
You should be able to pass the array like this:
cursor.execute(
"INSERT INTO estoque1 (release, official, order, date, product, client, sales, sales, quant) VALUES (%s, %s, %s, %s, %s ,%s ,%s ,%s ,%s)",
*row
)
Note the asterisk.
The Problem is that you only pass one parameter (the row list) to the 9 placeholders in your string. To fix that you need either convert the list to a tuple, use the "*" operator to unpack the list or pass all the values of the list individually.
Use the * operator to unpack:
row = [1, 2, 3, 4]
"%s %s %s %s" % (*row,)
Use tuple:
row = [1, 2, 3, 4]
"%s %s %s %s" % tuple(row)
Or you list all parameters extra:
row = [1, 2, 3, 4]
"%s %s %s %s" % (row[0], row[1], row[2], row[3])

mixing placeholders, executemany, and table names

I can iterate through a python object with te following code, however I would like to be able to use placeholders for the schema and table name, normally I do this with {}.{} ad the .format() methods, but how do you combine the two?
cur.executemany("INSERT INTO schema.table_name (x,y,z) "
"values (%s, %s, %s)", top_sample)
Depends on which python you use you can try use f-string
schema = "schema"
table_name = "table_name"
cur.executemany(f"INSERT INTO {schema}.{table_name} (x,y,z) values (%s, %s, %s)", top_sample)
check PEP 498 -- Literal String Interpolation
another option is a simple format
cur.executemany("INSERT INTO {schema}.{table_name} (x,y,z) values (%s, %s, %s)".format(schema=schema, table_name=table_name), top_sample)
but I find the first option shorter and cleaner
I'm not sure what the issue is. You can very well use format like this:
cur.executemany("INSERT INTO {}.{} (x,y,z) values (%s, %s, %s)".format('hello', 'world'), top_sample)
cur.executemany(
"""INSERT INTO schema.{table_name} (x,y,z) values (%s, %s, %s)""".format(table_name=your_table_name),
top_sample
)
place your table name in place of your_table_name

mysql insert code produced following error: Not all parameters were used in the SQL statement

Below is my code for loading data in a mysql table (7 column of information):
#!/usr/bin/env python
import mysql.connector, csv, sys
csv.field_size_limit(sys.maxsize)
cnx = mysql.connector.connect(user='testuser', password= 'testuser', database='database')
cursor = cnx.cursor()
table=csv.reader(file("logs.txt"), delimiter='\t')
for row in table:
cursor.execute("INSERT INTO first_table (column1, column2, column3, column4 , column5, column6, column7) VALUES (%s, %s, %s, %s, %s, %s, %s)", row)
cnx.commit()
cnx.close()
Below is truncated content of what logs.txt file consist of, 7 columns (tab-delimited) and the last column (mouse_phene_id) may be empty or has single or multiple items delimited by space:
human_gene_symbol entrez_id homolog_id hgnc_assoc mouse_gene_symbol mouse_mgi_id mouse_phene_id
A1BG 1 11167 yes A1bg MGI:2152878
A1CF 29974 16363 yes A1cf MGI:1917115 MP:0005387 MP:0005386 MP:0005388 MP:0005385 MP:0002873 MP:0010768 MP:0005369 MP:0005376 MP:0005384 MP:0005378
A2M 2 37248 yes A2m MGI:2449119
A3GALT2 127550 16326 yes A3galt2 MGI:2685279
A4GALT 53947 9690 yes A4galt MGI:3512453 MP:0005386 MP:0010768 MP:0005376
A4GNT 51146 87446 yes A4gnt MGI:2143261 MP:0005384 MP:0002006 MP:0005385 MP:0005381 MP:0005387
AAAS 8086 9232 yes Aaas MGI:2443767 MP:0005389 MP:0005386 MP:0005378
AACS 65985 11322 yes Aacs MGI:1926144
AADAC 13 37436 yes Aadac MGI:1915008
I get the following error and since this is a common error, I tried everything that was posted on stackoverflow related to this error and still unfixed and stuck:
Traceback (most recent call last):
File "insert_mysql.py", line 23, in <module>
cursor.execute("INSERT INTO first_table (column1, column2, column3, column4 , column5, column6, column7) VALUES (%s, %s, %s, %s, %s, %s, %s)", row)
File "/usr/local/lib/python2.7/dist-packages/mysql/connector/cursor.py", line 551, in execute "Not all parameters were used in the SQL statement")
mysql.connector.errors.ProgrammingError: Not all parameters were used in the SQL statement
first_table:
Greatly appreciate any help, advice. Thank you in advance.
The error you are seeing is likely being caused by a row in the logs.txt file which does not have the same number of items (7) which your insert prepared statement is expecting. The error
Not all parameters were used in the SQL statement
means that you had one or more positional parameters which could not be assigned values, and hence were not used.
I can make two suggestions about where in the file the problem might be. First, you log file may have one or more header lines, which might not have the same number of columns or types of values the script expects. You may skip any number of initial lines via this:
skip_first_line = next(table)
skip_second_line = next(table)
# etc.
In other words, just call next() to consume any number of initial lines in the log file which do not contain the actual data you want to insert.
If not, then perhaps the problematic line is somewhere in the middle of the file. You can try printing each row in your script, to see where it dies:
for row in table:
cursor.execute("INSERT INTO first_table (column1, column2, column3, column4 , column5, column6, column7) VALUES (%s, %s, %s, %s, %s, %s, %s)", row)
print row

psycopg2: Insert a numpy array of strings into an PostgreSQL table

Please consider me to be a complete novice with psycopg2. My aim is to insert a 1D numpy array of dtype object (where the elements are only strings) into a postgresQL table. My main program saves the fields as strings in the numpy array. I then want to add each separate element to a column in the postgresqL table (or if you prefer, the the 1d Array is one row). Please note, the actual array has 36 elements! I need a method to put them all in.
I am using the cur.execute command although I believe there is some problem with the string conversion.
Array=np.empty(3,dype=object)
Array[0]='Hello'
Array[1]='Tea?'
Array[2]='Bye'
statement= "INSERT INTO testlog (field1,field2,field3) VALUES (%s)" #Etc.
cur.execute(statement,Array)
I get error:
cur.execute(statement,Array)
TypeError: not all arguments converted during string formatting
Also tried:
cur.executemany('INSERT INTO testlog VALUES ( %s )', [[v] for v in Array ]
Thanks
Your statement should contain place holder for all values:
statement= "INSERT INTO testlog (field1,field2,field3) VALUES (%s, %s, %s)"
For Example:
=# create table testlog (field1 varchar(50), field2 varchar(50), field3 varchar(50))`;
then in the python shell (note dtype not dype:
Array=np.empty(3,dtype=object)
Array[0]='Hello'
Array[1]='Tea?'
Array[2]='Bye'
sql = "INSERT INTO testlog (field1, field2, field3) VALUES (%s, %s, %s)"
cur.execute(sql, [f for f in Array])
conn.commit()
And in DB:
select * from testlog;
field1 | field2 | field3
--------+--------+--------
Hello | Tea? | Bye
(1 row)

Categories