How to update certain group of values in single column using SQL - python

I'm brand new in python and updating tables using sql. I would like to ask how to update certain group of values in single column using SQL. Please see example below:
id
123
999991234
235
789
200
999993456
I need to add the missing prefix '99999' to the records without '99999'. The id column has integer data type by default. I've tried the sql statement, but I have a conflict between data types that's I've tried with cast statement:
update tablename
set id = concat('99999', cast(id as string))
where id not like '99999%';

To be able to use the LIKE operator and CONCAT() function, the column data type should be a STRING or BYTE. In this case, you would need to cast the WHERE clause condition as well as the value of the SET statement.
Using your sample data:
Ran this update script:
UPDATE mydataset.my_table
SET id = CAST(CONCAT('99999', CAST(id AS STRING)) AS INTEGER)
WHERE CAST(id as STRING) NOT LIKE '99999%'
Result:
Rows were updated successfully and the table ended up with this data:

Related

How can I parse a row's column value passed to a UDF when mapping a column?

I have a dataframe like this, for the sake of simplicity i'm just showing 2 columns both columns are string, but in real life it will have more columns each of different types other than string:
SQLText
TableName
select * from sourceTable;
NewTable
select * from sourceTable1;
NewTable1
I also have a custom Function where i want to iterate over the dataframe and get the sql and run it to create a table, however I'm not passing each column individually, but rather the whole row:
def CreateTables(rowp):
df = spark.sql(rowp.SQLText)
#code to create table using rowp.TableName
This is my code, I first clean up SQLText because it's stored in another table and then I run the UDF on the column:
l = l.withColumn("SQLText", F.lit(F.regexp_replace(F.col("SQLText").cast("string"), "[\n\r]", " ")))
nt = l.select(l["*"]).withColumn("TableName",CreateTables(F.struct(*list(l.columns)) )).select("TableName","SQLText")
nt.show(truncate=False)
So when I'm running the function, and I try to run the code above, it errors out because instead of parsing the rowp.SQLText into its literal value, it passes its type?:
Column<'struct(SourceSQL, TableName)[SourceSQL]'>
So in the CreateTables function, when spark.sql(rowp.SQLText) is executed I expect the following:
df = spark.sql("select * from sourceTable;")
but instead this is happening, the variable type is literally being sent instead of the variable value
df = spark.sql("Column<'struct(SourceSQL, TableName)[SourceSQL]'>")
I've tried numerous solutions: getItem, getField, get, getAs but no luck yet.
I've also tried using indexes like rowp[0] but it just changes the variable type passed to the spark.sql function:
Column<'struct(SourceSQL, TableName)[0]'>
If I try rowp(0) it gives me a Column is not callable error.
There are many ways to do this.
Here is one way I tested in pyspark 3.2.3
rows = df.rdd.collect()
for i in range(len(rows)):
spark.sql(rows[i][0])

How can I get a MySQL database to insert a default value if there's an attempt to insert a null value with Python?

I've read answers that do something similar but not exactly what I'm looking for, which is: attempting to insert a row with a NULL value in a column will result instead in that column's DEFAULT value being inserted.
I'm trying to process a large number of inserts in the mySQL Python connector with a large number of column values that I don't want to deal with individually, and none of the typical alternatives work here. Here is a sketch of my code:
qry = "INSERT INTO table (col1, col2, ...) VALUES (%s, %s, ...)"
row_data_dict = defaultdict(lambda : None, {...})
params = []
for col in [col1, col2, ...]:
params.append(row_data_dict[col])
cursor.execute(qry, tuple(params))
My main problem is that setting None as the default in the dictionary results in either NULL being inserted or an error if I specify the row as NOT NULL. I have a large number of columns that might change in the future so I'd want to avoid setting different 'default' values for different entries if at all possible.
I can't do the typical way of inserting DEFAULT by skipping over columns on the insert because while those columns might have the DEFAULT value, I can't guarantee it and considering I'm doing a large number of inserts I don't want to change the query string each time I insert depending on if it's default or not.
The other way of inserting DEFAULT seems to be to have DEFAULT as one of the parameters (e.g. INSERT INTO table (col1,...) VALUES (DEFAULT,...)) but in my case setting the default in the dictionary to 'DEFAULT' results in error (mySQL complains about it being an incorrect integer value on trying to insert into an integer column, making it seem like it's interpreting the default as a string and not a keyword).
This seems like it would be a relatively common use case, so it kind of shocks me that I can't figure out a way to do this. I'd appreciate any way to do this or get around it that I haven't already listed here.
EDIT: All the of the relevant columns are already labeled with a DEFAULT value, it doesn't seem to actually replace NULL (or python's None) when it's inserted.
EDIT 2: The reason why I want to avoid NULL so badly is because NULL != NULL and I want to have unique rows, so that if there's one row (1, 2, 3, 'Unknown'), INSERT IGNORE'ing a row (1, 2, 3, 'Unknown') won't insert it. With NULL you end up with a bunch of copies of the same record because one of the values is unknown.
You can use the DEFAULT() function in the VALUES list to specify that default value for the column should be used. And you can put this in an IFNULL() call so it will be used when the supplied value is NULL.
qry = """INSERT INTO table (col1, col2, ...)
VALUES (IFNULL(%s, DEFAULT(col1)), IFNULL(%s, DEFAULT(col2)), ...)"""
Welcome to Stackoverflow. What you need to do is in your database add a default value for the column you want to have the default value. When you create your table just use DEFAULT and then the value after you create the column in the table, like this:
CREATE TABLE `yourTable` (`id` INT DEFAULT 0, .....)
if you have already created the table and you need to alter the existing column, you would do something like this:
ALTER TABLE `yourTable` MODIFY `id` INT DEFAULT 0
so in your insert statement coming from python, as long as you pass in either NULL or Nothing for the value of that column then when the row is inserted into your database, the default value will be populated for that column
Another thing to keep in mind is that you have to pass in the proper number of values when you have a default set up for a column. Say you have a table with 3 columns, we'll call them colA, colB and colC.
if you want to insert a row with colA_value for colA, nothing for colB so it will use it's default value and colC_value for colC then you need to still pass in 3 values that will be used for your insert. If you just passed in colA_value and colC_value, then colA will get colA_value and colB will get colC_value and colC will be null. you need to pass in values that will be interpreted by MySQL like this:
INSERT INTO `yourTable` (`colA`, `colB`, `colC`)
VALUES
('colA_value', null, 'colC_value')
even though you are not passing in anything for colB you need to pass a null value from your python program by either passing null or None to MySQL for the value for colB in order to get colB to be populated with it's default value
if you only pass in 2 values to MySQL to insert a row in your table, the insert statement under the hood will look like this:
INSERT INTO `yourTable` (`colA`, `colB`, `colC`)
VALUES
('colA_value', 'colC_value')
which would result in colA getting set to colA_value, colB getting set to colC_value and colC being left as null
if you are passing in the right number of values to be inserted into MySQL (that would mean you need to include null or None for the value to be inserted into the column with the default value) than that is another story. Please let me know if you are passing in the right number of values so I can help you troubleshoot further if needed.

What is the correct way to use distinct on (Postgres) with SqlAlchemy?

I want to get all the columns of a table with max(timestamp) and group by name.
What i have tried so far is:
normal_query ="Select max(timestamp) as time from table"
event_list = normal_query \
.distinct(Table.name)\
.filter_by(**filter_by_query) \
.filter(*queries) \
.group_by(*group_by_fields) \
.order_by('').all()
the query i get :
SELECT DISTINCT ON (schema.table.name) , max(timestamp)....
this query basically returns two columns with name and timestamp.
whereas, the query i want :
SELECT DISTINCT ON (schema.table.name) * from table order by ....
which returns all the columns in that table.Which is the expected behavior and i am able to get all the columns, how could i right it down in python to get to this statement?.Basically the asterisk is missing.
Can somebody help me?
What you seem to be after is the DISTINCT ON ... ORDER BY idiom in Postgresql for selecting greatest-n-per-group results (N = 1). So instead of grouping and aggregating just
event_list = Table.query.\
distinct(Table.name).\
filter_by(**filter_by_query).\
filter(*queries).\
order_by(Table.name, Table.timestamp.desc()).\
all()
This will end up selecting rows "grouped" by name, having the greatest timestamp value.
You do not want to use the asterisk most of the time, not in your application code anyway, unless you're doing manual ad-hoc queries. The asterisk is basically "all columns from the FROM table/relation", which might then break your assumptions later, if you add columns, reorder them, and such.
In case you'd like to order the resulting rows based on timestamp in the final result, you can use for example Query.from_self() to turn the query to a subquery, and order in the enclosing query:
event_list = Table.query.\
distinct(Table.name).\
filter_by(**filter_by_query).\
filter(*queries).\
order_by(Table.name, Table.timestamp.desc()).\
from_self().\
order_by(Table.timestamp.desc()).\
all()

count how many times in an sqlite3 database table column the values occurs

I have been performing a query to count how many times in my sqlite3 database table (Users), within the column "country", the value "Australia" occurs.
australia = db.session.query(Users.country).filter_by(country="Australia").count()
I need to do this in a more dynamic way for any country value that may be within this column.
I have tried the following but unfortunately I only get a count of 0 for all values that are passed in the loop variable (each).
country = list(db.session.query(Users.country))
country_dict = list(set(country))
for each in country_dict:
print(db.session.query(Users.country).filter_by(country=(str(each))).count())
Any assistance would be greatly appreciated.
The issue is that country is a list of result tuples, not a list of strings. The end result is that the value of str(each) is something along the lines of ('Australia',), which should make it obvious why you are getting counts of 0 as results.
For when you want to extract a list of single column values, see here. When you want distinct results, use DISTINCT in SQL.
But you should not first query distinct countries and then fire a query to count the occurrence of each one. Instead use GROUP BY:
country_counts = db.session.query(Users.country, db.func.count()).\
group_by(Users.country).\
all()
for country, count in country_counts:
print(country, count)
The main thing to note is that SQLAlchemy does not hide the SQL when using the ORM, but works with it.
If you can use the sqlite3 module with direct SQL it is a simple query:
curs = con.execute("SELECT COUNT(*) FROM users WHERE country=?", ("Australia",))
nb = curs.fetchone()[0]

Replacing NULLS by some value in a SQL table

I have a table in SQLite3 database (using Python), Tweet Table (TwTbl) that has some values in the column geo_id. Most of the values in this column are NULL\None. I want to replace/update all NULLS in the geo_id column of TwTbl by a number 999. I am not sure about the syntax. I am trying the following query, but I am getting an error ("No such Column: None")
c.execute("update TwTbl SET geo_id = 999 where geo_id = None").fetchall()
I even tried using Null instead of None, that did not give any errors but did not do any update.
Any help will be appreciated.
As an answer, so that you can accept it if you're inclined.
You need Is Null instead of = Null. Null is a special value that's indeterminate, and neither equal nor non-equal in most database implementations.

Categories