Why is my second database column being populated with 'none'? - python

I am trying to create a database with three columns, URL which is the location of the data I am aiming to scrape, STATUS which is the ticker symbol of the stock, and STATUS which will be used to inform whether the data has been acquired yet, or not.
import sqlite3
conn = sqlite3.connect('tickers.db')
conn.execute('''CREATE TABLE TAB(URL, TICKER, STATUS default "Not started");''')
for i in url_list:
conn.execute("INSERT INTO TAB(URL) VALUES(?)",(i,))
for j in ticklist:
conn.execute("INSERT INTO TAB(TICKER) VALUES(?)",(j,))
for row in conn.execute("SELECT URL, TICKER, STATUS from TAB"):
print('URL={i}'.format(i=row[0]))
print('TICKER={i}'.format(i=row[1]))
print('STATUS={i}'.format(i=row[2]))
To populate the URL column I have used a list of URL's, similarly I am trying to the same thing with TICKER, however when I run the code, the column is only populated with 'none' for all rows.
Output
URL=https://api.pushshift.io/reddit/search/submission/?q=$AACG&subreddit=wallstreetbets&metadata=true&size=0&after=1610928000&before=1613088000
TICKER=None
STATUS=Not started
URL=https://api.pushshift.io/reddit/search/submission/?q=$AACIU&subreddit=wallstreetbets&metadata=true&size=0&after=1610928000&before=1613088000
TICKER=None
STATUS=Not started

Instead of trying to populate the columns, why not insert as rows directly?
Assuming url_list and ticklist are of equal length (and even if not) you can Try this:
for i, j in zip(url_list,ticklist):
conn.execute("INSERT INTO TAB(URL, TICKER) VALUES(?,?)",(i,j))
That way you are adding the values as expected and not creating new rows with every insert

Related

how to add headers when selecting items from an SQL table?

so my sql query quite simply just selects all of the items in the table and prints each entry one under the other. I am wondering if there is a way to get headers above each of the items so the user can tell what each item in the table means here is my current query if needed:
c.execute("SELECT * FROM outflows1")
data = c.fetchall()
print(data)
print("Currently in database:")
for row in data:
print(row)
conn.commit()
so this query outputs all of the items in my database however I am wondering if there is a way to put headers above the output which label what each of the items are. Thanks

How to update row element of SQL database where its value is dependent on another elemnt of the same row by a python function?

Is there any way to use a python function in the SQL statement?
I have a python function returning a value basaed on the input:
newvalue = myfunc(input_value)
I have the SQL table "users" with columns:
id, name, some_value_of_user, value_to_be_calculated
I would like to update (all) rows by using the function myfunc without extracting in each row some_value_of_user then using myfunc(some_value_of_user) and then separately updateing the value_to_be_calculated of the row.
I would expect something like this:
mycursor.execute(
"UPDATE users SET value_to_be_calculated = %s WHERE value_to_be_calculated = NULL",
myfunc(some_value_of_user)
)
Thank you!

How to make automatically items id numeration?

I'm trying to insert some data into SQL database, and the problem is that I'm really green on this. So the MAIN problem is that How can I sort all the items in table? I have 3 main things: ID, CARNUM, TIME. But in this 'Insertion' I have to type the id manually. How can I make that the system would create a numeric id numeration automatically?
Here's the insertion code:
postgres_insert_query = """ INSERT INTO Vartotojai (ID, CARNUM, TIME) VALUES (%s,%s,%s)"""
record_to_insert = (id, car_numb, Reg_Tikslus_Laikas)
cursor.execute(postgres_insert_query, record_to_insert)
connection.commit()
count = cursor.rowcount
print (count, "Record inserted successfully into mobile table")
pgadmin sort
pgadmin table
You could change the datatype of ID to serial, which is an auto incrementing integer. Meaning that you don't have to manually enter an ID when inserting into the database.
Read more about datatype serial: source

Big query insert /delete to table

I have a table X in big query with 170,000 rows . The values on this table as based on complex calculations done on the values from a table Y. These are done in python so as to automate the ingestion when Y gets updated.
Every time Y updates, I recompute the values needed for X in my script and insert them using the script below using streaming:
def stream_data(table, json_data):
data = json.loads(str(json_data))
# Reload the table to get the schema.
table.reload()
rows = [data]
errors = table.insert_data(rows)
if not errors:
print('Loaded 1 row into {}'.format( table))
else:
print('Errors:')
The problem here is that I have to delete all rows in the table before I insert . I know a query to do this but it fails because big query does not allow DML when there is a streaming buffer on the table and this is for one day apparently.
IS there a workaround where I can delete all rows in X , recompute based on Y and then insert the new values using the code above ??
Possibly turning the streaming buffer off ??!!
Another option would be to drop the whole table and recreate it . But my table is huge with 60 columns and the JSON for the schema would be huge . I couldn't find samples where I can create a new table with schema passed from json/file ? Some samples in this would be great.
A third option is to make the streaming insert smart that it does an update instead of insert if the row has changed . This again is a DML operation and goes back to original problem.
UPDATE:
another approach I tried is to delete the table and recreate it . Before delete I copy the schema so I can set it in the new table.:
def stream_data( json_data):
bigquery_client = bigquery.Client("myproject")
dataset = bigquery_client.dataset("mydataset")
table = dataset.table("test")
data = json.loads(json_data)
schema=table.schema
table.delete()
table = dataset.table("test")
# Set the table schema
table = dataset.table("test",schema)
table.create()
rows = [data]
errors = table.insert_data(rows)
if not errors:
print('Loaded 1 row ')
else:
print('Errors:')
This gives me an error :
ValueError: Set either 'view_query' or 'schema'.
UPDATE 2:
Key was to do a
table.reload() before
schema=table.schema to fix the above!

Put retrieved data from MySQL query into DataFrame pandas by a for loop

I have one database with two tables, both have a column called barcode, the aim is to retrieve barcode from one table and search for the entries in the other where extra information of that certain barcode is stored. I would like to have bothe retrieved data to be saved in a DataFrame. The problem is when I want to insert the retrieved data into DataFrame from the second query, it stores only the last entry:
import mysql.connector
import pandas as pd
cnx = mysql.connector(user,password,host,database)
query_barcode = ("SELECT barcode FROM barcode_store")
cursor = cnx.cursor()
cursor.execute(query_barcode)
data_barcode = cursor.fetchall()
Up to this point everything works smoothly, and here is the part with problem:
query_info = ("SELECT product_code FROM product_info WHERE barcode=%s")
for each_barcode in data_barcode:
cursor.execute(query_info % each_barcode)
pro_info = pd.DataFrame(cursor.fetchall())
pro_info contains only the last matching barcode information! While I want to retrieve all the information for each data_barcode match.
That's because you are consistently overriding existing pro_info with new data in each loop iteration. You should rather do something like:
query_info = ("SELECT product_code FROM product_info")
cursor.execute(query_info)
pro_info = pd.DataFrame(cursor.fetchall())
Making so many SELECTs is redundant since you can get all records in one SELECT and instantly insert them to your DataFrame.
#edit: However if you need to use the WHERE statement to fetch only specific products, you need to store records in a list until you insert them to DataFrame. So your code will eventually look like:
pro_list = []
query_info = ("SELECT product_code FROM product_info WHERE barcode=%s")
for each_barcode in data_barcode:
cursor.execute(query_info % each_barcode)
pro_list.append(cursor.fetchone())
pro_info = pd.DataFrame(pro_list)
Cheers!

Categories