I am writing a python script that is converting a CSV file into an sqlite3 database. There is an id column that i have set up to be "primary key unique" and i know in the CSV file there is repeating information. How to i tell it to only store non-repeating information into the database?
Here is what i have so far.
for row in reader:
counter += 1
#this gets rid of the header in the CSV file
if counter == 1:
continue
s = (row[0],row[2],row[1],row[4],row[3],row[7],row[8],row[9])
course = row[5].split(" ")
c = (row[0],course[0],course[1],row[6])
#when it hits here and sees that two ids are the same, it crashes because it will not allow non-unique values.
curs.execute('''insert into students (id,lastname,firstname,major,email,city,state,zip)
values (?,?,?,?,?,?,?,?)''', s)
curs.execute('''insert into classes (id,subjcode,coursenumber,termcode)
values (?,?,?,?)''', c)
I would really appreciate the help.
You could use INSERT OR IGNORE:
curs.execute('''INSERT OR IGNORE INTO students (id,lastname,firstname,major,email,city,state,zip) VALUES (?,?,?,?,?,?,?,?)''', s)
This will insert the first row with a duplicate id, but ignore all successive duplicates.
you can do that by using a UNIQUE constraint on your tables' id, and then use a INSERT OR IGNORE
Related
Say I have 100 different integers I want to store like a row with 100 columns.
I am trying it like this:
db = sqlite3.connect("test.db")
c = db.cursor()
c.execute('''
CREATE TABLE IF NOT EXISTS nums(
id INTEGER PRIMARY KEY,
''')
for i in range(100):
c.execute('''
ALTER TABLE nums
ADD ''' + 'column_' + i + '''INTEGER''')
db.commit()
Someone told me that when you are using numbers as column names you could probably do it a better way. But if I for example have a list with strings in python, and I want to loop through them and store every individual string in its own column, the approach would be the same, right?
However, this code runs without errors for me, but no new table is created, how come?
Your ALTER statement is incorrect as it's missing the COLUMN after ADD. You can use the following:
for i in range(100):
c.execute(f'ALTER TABLE nums ADD COLUMN column_{i} INTEGER')
Im trying to store my json file to the popNames DB but this error pops up.
My Json file is a dictionary with the country being the key and the person names as key_value. In my DB I want to put the country as the first element as a primary and the names in the subsequent column in the db table
Could anyone help me with this?
enter image description here
Every INSERT call creates a new row in the PopNamesDB table. Your code creates many such rows: the first row has a country but NULL for all the other columns. The next N rows each have a null country, a value for colName, and NULL for all the other columns.
An easy way to fix your code is to change your followup INSERT calls (on line 109) to change the row you created earlier, instead of creating new rows. The query will look something like
cur.execute(''' UPDATE PopNamesDB SET ''' + colName + ''' = ? WHERE country = ?''', (y, c))
I'm hoping to duplicate my techniques for looping through tables in R using python in the ArcGIS/arcpy framework. Specifically, is there a practical way to loop through the rows of an attribute table using python and copy that data based on the values from previous table values?
For example, using R I would use code similar to the following to copy rows of data from one table that have unique values for a specific variable:
## table name: data
## variable of interest: variable
## new table: new.data
for (i in 1:nrow(data))
{
if (data$variable[i] != data$variable[i-1])
{
rbind(new.data,data[i,])
}
}
If I've written the above code correctly then in words, this for-loop simply checks to see if the current value in a table is different from the previous value and adds all column values for that row to the new table if it is in fact a new value. Any help with this thought process would be great.
Thanks!
To just get the unique values in a table in a field in arcpy:
import arcpy
table = "mytable"
field = "my_field"
# ArcGIS 10.0
unique_values = set(row.getValue(field) for row in iter(arcpy.SearchCursor(table).next, None))
# ArcGIS 10.1+
unique_values = {row[0] for row in arcpy.da.SearchCursor(table, field)}
Yes to loop through values in table using arcpy you want to use a cursor. Its been a while since I've used arcpy, but if I recall correctly the one you want is a search cursor. In its simplest form this is what it would look like:
import arcpy
curObj = arcpy.SearchCursor(r"C:/shape.shp")
row = curObj.next()
while row:
columnValue = row.getValue("columnName")
row = curObj.next()
As of version 10 (i think) they introduced a data access cursor which is orders of magnitude faster. Data access or DA cursors require you to declare what columns you want to have returned when you create the cursor. Example:
import arcpy
columns = ['column1', 'something', 'someothercolumn']
curObj = arcpy.da.SearchCursor(r"C:/somefile.shp", columns)
for row in curObj:
print 'column1 is', row[0]
print 'someothercolumn is', row[2]
I have written a simple script that prints out and adds the name of a table and it's associated column headings to a python list:
for table in arcpy.ListTables():
for field in arcpy.ListFields(table):
b.append(field.name + "," + fc)
print b
In each table there are a number of column headings. There are many instances where one or more tables contain the same column headings. I want to do a bit of a reverse python dictionary instead of a list, where keys are the column headings and the values are the table names. My idea is, to find the all the tables that each column heading lies within.
I've been playing around all afternoon and I think I am over thinking this so I came here for some help. If anyone can suggest how I can accomplish this, i would appreciate it.
Thanks,
Mike
Try this:
result = {}
for table in arcpy.ListTables():
for field in arcpy.ListFields(table):
result.setdefault(field.name, []).append(table)
If I understand correctly, you want to map from a column name to a list of tables that contain that have columns with that name. That should be easy enough to do with a defaultdict:
from collections import defaultdict
header_to_table_dict = defaultdict(list)
for table in arcpy.ListTables():
for field in arcpy.ListFields(table):
header_to_table_dict[field.name].append(table.name)
I'm not sure if table.name is what you want to save, exactly, but this should get you on the right track.
You want to create a dictionary in which each key is a field name, and each value is a list of table names:
# initialize the dictionary
col_index = {}
for table in arcpy.ListTables():
for field in arcpy.ListFields(table):
if field.name not in col_index:
# this is a field name we haven't seen before,
# so initialize a dictionary entry with an empty list
# as the corresponding value
col_index[field.name] = []
# add the table name to the list of tables for this field name
col_index[field.name].append(table.name)
And then, if you want want a list of tables that contain the field LastName:
list_of_tables = col_index['LastName']
If you're using a database that is case-insensitive with respect to column names, you might want to convert field.name to upper case before testing the dictionary.
Is there a possibility to retrieve random rows from Cassandra (using it with Python/Pycassa)?
Update: With random rows I mean randomly selected rows!
You might be able to do this by making a get_range request with a random start key (just a random string), and a row_count of 1.
From memory, I think the finish key would need to be the same as start, so that the query 'wraps around' the keyspace; this would normally return all rows, but the row_count will limit that.
Haven't tried it but this should ensure you get a single result without having to know exact row keys.
Not sure what you mean by random rows. If you mean random access rows, then sure you can do it very easily:
import pycassa.pool
import pycassa.columnfamily
pool = pycassa.pool.ConnectionPool('keyspace', ['localhost:9160']
cf = pycassa.columnfamily.ColumnFamily(pool, 'cfname')
row = cf.get('row_key')
That will give you any row. If you mean that you want a randomly selected row, I don't think you'd be able to do that very easily without knowing what the keys are. You could generate an index row and then select a random column from that and use that to grab a row from another column family. Basically, you'd need to create a new row where each column value, was a row key from the column family from which you are trying to select a row. Then you could grab a column randomly from that row and you have the key to a random row.
I don't think pycassa offers any support to grab a random, non-indexed row.
This works for my case:
ini = random.randint(0, 999999999)
rows = col_fam.get_range(str(ini), row_count=1, column_count=0,filter_empty=False)
You'll have to adapt to your row key type (string in my case)