I want to develop a script to update individual cells (row of a specific column) of an attribute table based on the value of the cell that comes immediately before it as well as data in other columns but in the same row. I'm sure that this can be done with cursors but I'm having trouble conceptualizing exactly how to tackle this.
Essentially what I want to do is this:
If Column A, row 13 = a certain value AND Column B, row 13 = a certain value (but different from A), then change Column A, row 13 to be the same value as Column A, row 12.
If this can't be done with cursors then maybe some kind of array or matrix, or list of lists would be the way to go? I'm basically looking for the best direction to take with this. EDIT: My files are shapefiles or I also have them in .csv format. My code is really basic right now:
import arcpy
from arcpy import env
env.workspace = "C:/All Data Files/My Documents All/My Documents/wrk"
inputLyr = "C:/All Data Files/My Documents All/My Documents/wrk/file.lyr"
fields = ["time", "lon", "activityIn", "time", "fixType"]
cursor180 = arcpy.da.SearchCursor(inputLyr, fields, """"lon" = -180""")
for row in cursor180:
# Print the rows that have no data, along with activity Intensity
print row[0], row[1], row[2]
Related
I'm sending API calls to Google sheets to retrieve information like so:
gc = gspread.authorize(credentials)
def grab_available_row(wks):
str_list = list(filter(None, wks.col_values(17)))
return str(len(str_list)+1)
wks = gc.open("test").worksheet("Logs")
grab_row = grab_available_row(wks)
try:
GrabRequestTest = wks.acell("B{}".format(grab_row)).value
except:
pass
try:
print(GrabRequestTest)
ctypes.windll.user32.MessageBoxW(0, "DONE!!!", "DONE!!!", 1)
sys.exit()
except:
pass
With this, I can retrieve information in any row if there is no value present in column #17. In other words, this essentially reads from the first available row without anything in column #17. If I put an X in column 17, it will read the row below it. This isn't exactly what I'm looking for.
I'd like to be able to print all values in a row where a specific character like X is present in column 17, and ignore all other rows. I'd then take the data from each row with X present in column 17 and use mail merge to generate a bunch of .docx files. I can easily figure out the second part. Anybody know how to accomplish the first part? (print values in a specific row where X is present in column 17)
From I'd like to be able to print all values in a row where a specific character like X is present in column 17, and ignore all other rows. I'd then take the data from each row with X present in column 17 and use mail merge to generate a bunch of .docx files. I can easily figure out the second part. Anybody know how to accomplish the first part? (print values in a specific row where X is present in column 17), I believe your goal in this question is as follows.
You want to retrieve the filtered rows by the specific value at the column 17 (it's column "Q".).
You want to achieve this using gspread for python.
In this case, how about the following modification?
Modified script:
gc = gspread.authorize(credentials)
wks = gc.open("test").worksheet("Logs")
search = "X" # Please set the search value you expect.
values = [r for r in wks.get_all_values() if r[16] == search]
print(values)
When this script is run, the rows that the column "Q" is the value of search are retrieved as a 2-dimensional array.
Added:
From the following reply,
This is almost it! How can you print by column only? like, I only want the value from column 3 from the array.. print(values[4]) doesn't seem to work.
In this case, how about the following sample script?
Sample script:
gc = gspread.authorize(credentials)
wks = gc.open("test").worksheet("Logs")
search = "X" # Please set the search value you expect.
col = 3 # From your reply, the values of the column "C" is retrieved.
values = [r[col - 1] for r in wks.get_all_values() if r[16] == search]
print(values)
I need to compare two DataFrames at at time to find out if the values match or not. One DataFrame is from an Excel workbook and the other is from a SQL query. The problem is that not only might the columns be out of sequence, but the column headers might have a different name as well. This would prevent me from simply getting the Excel column headers and using those to rearrange the columns in the SQL DataFrame. In addition, I will be doing this across several tabs in an excel work book and against different queries. Not only do the column names differ from excel to SQL, but they may also differ from excel to excel and SQL to SQL.
I did create a solution, but not only is it very choppy, but I'm concerned it will begin to take up a considerable amount of memory to run.
The solution entails using lists in a list. If the excel value is in the same list as the SQL value they are considered a match and the function will return the final order that the SQL DataFrame must change to in order to match the same order that the Excel DataFrame is using. In case I missed some possibilities and the newly created order list has a different length than what is needed, I simply return the original SQL list of headers in the original order.
The example below is barely a fraction of what I will actually be working with. The actual number of variations and column names are much higher than the example below. Any suggestions anyone has on how to improve this function, or offer a better solution to this problem, would be appreciated.
Here is an example:
#Example data
exceltab1 = {'ColA':[1,2,3],
'ColB':[3,4,1],
'ColC':[4,1,2]}
exceltab2 = {'cColumn':[10,15,17],
'aColumn':[5,7,8],
'bColumn':[9,8,7]}
sqltab1 = {'Col/A':[1,2,3],
'Col/C':[4,1,2],
'Col/B':[3,4,1]}
sqltab2 = {'col_banana':[9,8,7],
'col_apple':[5,7,8],
'col_carrot':[10,15,17]}
#Code
import pandas as pd
ec1 = pd.DataFrame(exceltab1)
ec2 = pd.DataFrame(exceltab2)
sq1 = pd.DataFrame(sqltab1)
sq2 = pd.DataFrame(sqltab2)
#This will fail because the columns are out of order
result1 = (ec1.values == sq1.values).all()
def translate(excel_headers ,sql_headers):
translator = [["ColA", "aColumn", "Col/A", "col_apple"],
["ColB", "bColumn", "Col/B", "col_banana"],
["ColC", "cColumn", "Col/C", "col_carrot"]]
order = []
for i in range(len(excel_headers)):
for list in translator:
for item in sql_headers:
if excel_headers[i] in list and item in list:
order.append(item)
break
if len(order) != len(sql_headers):
return sql_headers
else:
return order
sq1 =sq1[translate(list(ec1.columns), list(sq1.columns))]
#This will pass because the columns now line up
result2 = (ec1.values == sq1.values).all()
print(f"Result 1: {result1} , Result 2: {result2}")
Result:
Result 1: False , Result 2: True
No code, but an algorithm.
We have a set of columns A and another B. We can compare a column from A and another from B and see if they're equal. We do that for all combinations of columns.
This can be seen as a bipartite graph where there are two groups of vertices A and B (one vertex for each column), and an edge exists between two vertices if those two columns are equal. Then the problem of translating column names is equivalent to finding a perfect matching in this bipartite graph.
An algorithm to do this with is Hopkroft-Karp, which has a Python implementation here. That finds maximum matchings, so you still have to check whether it found a perfect matching (that is, each column from A has an associated column from B).
I am working on a project using python to select certain values from an excel file. I am using the xlrd library and openpyxl library to do this.
The way the python program should we working is :
Grouping all the data point entries that are in a certain card tase. These are marked in column E. For example, all of the entries between row 26 and row 28 are in Card Task A, and hence they should be grouped together. All entries without a “Card Task” value in column E should not be considered as anything.
Next…
looking at the value from column N (lastExecTime) from a row and compare that time with the following value in column M
If it is seen that the times overlap (column M is less than the previous N value) it will increment a variable called “count” . Count stores the number of times a procedure overlaps.
Finally…
As for the output, the goal is to create a separate text file that displays which tasks are overlapping, and how many tasks overlap in a certain Card Task.
The problem that I am running into is that I cannot pair the data from a card task
Here is a sample of the excel data:
The data (a picture of it)
Here is a picture of more data (this will probably be more helpful)
Click here for it
And here is the code that I have written that tells me if there are multiple procedures going on:
from openpyxl import load_workbook
book = load_workbook('LearnerSummaryNoFormat.xlsx')
sheet = book['Sheet1']
for row in sheet.rows:
if ((row[4].value[:9]) != 'Card Task'):
print ("Is not a card task: " + str(row[1].value))
Essentially my problem is that I am not able to compare all the values from one card task with each other.
Blockquote
I would read through the data once like you have already but store all rows with 'Card Task' in a separate list. Once you have a list of only card task items you can compare.
card_task_row_object_list = []
count = 0
for row in sheet.rows:
if 'Card Task' in row[4]:
card_task_row_object_list.append(row)
From here you would want to compare the time values. What are you needed to check, if two different card task times overlap?
(row 12: start, row 13: end)
def compare_times(card_task_row_object_list):
for row in card_task_row_object_list:
for comparison_row in card_task_row_object_list:
if (comparison_row[12] <= row[13] && comparison_row[13] >= row[12])
# No overlap
else
count+=1
I'm hoping to duplicate my techniques for looping through tables in R using python in the ArcGIS/arcpy framework. Specifically, is there a practical way to loop through the rows of an attribute table using python and copy that data based on the values from previous table values?
For example, using R I would use code similar to the following to copy rows of data from one table that have unique values for a specific variable:
## table name: data
## variable of interest: variable
## new table: new.data
for (i in 1:nrow(data))
{
if (data$variable[i] != data$variable[i-1])
{
rbind(new.data,data[i,])
}
}
If I've written the above code correctly then in words, this for-loop simply checks to see if the current value in a table is different from the previous value and adds all column values for that row to the new table if it is in fact a new value. Any help with this thought process would be great.
Thanks!
To just get the unique values in a table in a field in arcpy:
import arcpy
table = "mytable"
field = "my_field"
# ArcGIS 10.0
unique_values = set(row.getValue(field) for row in iter(arcpy.SearchCursor(table).next, None))
# ArcGIS 10.1+
unique_values = {row[0] for row in arcpy.da.SearchCursor(table, field)}
Yes to loop through values in table using arcpy you want to use a cursor. Its been a while since I've used arcpy, but if I recall correctly the one you want is a search cursor. In its simplest form this is what it would look like:
import arcpy
curObj = arcpy.SearchCursor(r"C:/shape.shp")
row = curObj.next()
while row:
columnValue = row.getValue("columnName")
row = curObj.next()
As of version 10 (i think) they introduced a data access cursor which is orders of magnitude faster. Data access or DA cursors require you to declare what columns you want to have returned when you create the cursor. Example:
import arcpy
columns = ['column1', 'something', 'someothercolumn']
curObj = arcpy.da.SearchCursor(r"C:/somefile.shp", columns)
for row in curObj:
print 'column1 is', row[0]
print 'someothercolumn is', row[2]
Is there a possibility to retrieve random rows from Cassandra (using it with Python/Pycassa)?
Update: With random rows I mean randomly selected rows!
You might be able to do this by making a get_range request with a random start key (just a random string), and a row_count of 1.
From memory, I think the finish key would need to be the same as start, so that the query 'wraps around' the keyspace; this would normally return all rows, but the row_count will limit that.
Haven't tried it but this should ensure you get a single result without having to know exact row keys.
Not sure what you mean by random rows. If you mean random access rows, then sure you can do it very easily:
import pycassa.pool
import pycassa.columnfamily
pool = pycassa.pool.ConnectionPool('keyspace', ['localhost:9160']
cf = pycassa.columnfamily.ColumnFamily(pool, 'cfname')
row = cf.get('row_key')
That will give you any row. If you mean that you want a randomly selected row, I don't think you'd be able to do that very easily without knowing what the keys are. You could generate an index row and then select a random column from that and use that to grab a row from another column family. Basically, you'd need to create a new row where each column value, was a row key from the column family from which you are trying to select a row. Then you could grab a column randomly from that row and you have the key to a random row.
I don't think pycassa offers any support to grab a random, non-indexed row.
This works for my case:
ini = random.randint(0, 999999999)
rows = col_fam.get_range(str(ini), row_count=1, column_count=0,filter_empty=False)
You'll have to adapt to your row key type (string in my case)