handling command output in python - python

I want to work with the output of a wifi scan command. The output is several lines and I am interested in 2 information out of it. The goal is to have the ESSID and the address in a two dimmension array (hope thats right?) Here is what I got so far:
#!/usr/bin/python
import subprocess
import re
from time import sleep
# set wifi interface
wif = "wlan0"
So I get the command stdout and I find out that to work with this output in a loop I have to use iter
# check for WiFis nearby
wifi_out = subprocess.Popen(["iwlist", wif ,"scan"],stdout=subprocess.PIPE)
wifi_data = iter(wifi_out.stdout.readline,'')
Then I used enumerate to have the index and therefore I search for the line with the address and the next line (index + 1) would contain the ESSID
for index, line in enumerate(wifi_data):
searchObj = re.search( r'.* Cell [0-9][0-9] - Address: .*', line, re.M|re.I)
if searchObj:
print index, line
word = line.split()
wifi = [word[4],wifi_data[index + 1]]
Now I have two problems
1) wifi_data is the wrong Type
TypeError: 'callable-iterator' object has no attribute '__getitem__'
2) I guess with
wifi = [word[4],wifi_data[index + 1]]
I set the the variable every time new instead of have something that appends. But I want a variable that in the and has all ESSIDs together with all corresponding addresses.
I am new with python, so currently I imaging something like
WIFI[0][0] returns ESSID
WIFI[0][1] returns address to ESSID in WIFI[0][0]
WIFI[1][0] returns next ESSID
WIFI[1][1] returns address to ESSID in WIFI[1][0]
and so on. Or would be something else in python better to work with such kind of information?

I think you want
next(wifi_data)
since you cannot index into an iterator ... this will give you the next item ... but it may screw up your loop ...
although really you could just do
wifi_out = subprocess.Popen(["iwlist", wif ,"scan"],stdout=subprocess.PIPE)
wifi_data = wifi_out.communicate()[0].splitlines()
or even easier perhaps
wifi_data = subprocess.check_output(["iwlist",wif,"scan"]).splitlines()
and then you will have a list ... which will work more like you expect with regards to accessing the data via index (theres not really a good reason to use an iter for this that I can tell)

Related

How to use regex to dynamically find a value in a file in Python

I have a long string like this for example:
V:"production",PUBLIC_URL:"",WDS_SOCKET_HOST:void 0,WDS_SOCKET_PATH:void 0,WDS_SOCKET_PORT:void 0,FAST_REFRESH:!0,REACT_APP_CANDY_MACHINE_ID:"9mn5duMPUeNW5AJfbZWQgs5ivtiuYvQymqsCrZAenEdW",REACT_APP_SOLANA_NETWORK:"mainnet-beta
and I need to get the value of REACT_APP_CANDY_MACHINE_ID with regex, the value of it is always 44 characters long so that is a good thing I hope. Also the file/string im pulling it from is much much longer and the REACT_APP_CANDY_MACHINE_ID appears multiple times but it doesnt change
You don't need regex for that, just use index() to get the location of REACT_APP_CANDY_MACHINE_ID.
data = 'V:"production",PUBLIC_URL:"",WDS_SOCKET_HOST:void 0,WDS_SOCKET_PATH:void 0,WDS_SOCKET_PORT:void 0,FAST_REFRESH:!0,REACT_APP_CANDY_MACHINE_ID:"9mn5duMPUeNW5AJfbZWQgs5ivtiuYvQymqsCrZAenEdW",REACT_APP_SOLANA_NETWORK:"mainnet-beta'
key = "REACT_APP_CANDY_MACHINE_ID"
start = data.index(key) + len(key) + 2
print(data[start: start + 44])
# 9mn5duMPUeNW5AJfbZWQgs5ivtiuYvQymqsCrZAenEdW

Is there a way to obtain the actual A1 range from sheetfu - get_data_range()

I am trying to obtain the actual A1 values using the Sheetfu library's get_data_range().
When I use the code below, it works perfectly, and I get what I would expect.
invoice_sheet = spreadsheet.get_sheet_by_name('Invoice')
invoice_data_range = invoice_sheet.get_data_range()
invoice_values = invoice_data_range.get_values()
print(invoice_data_range)
print(invoice_values)
From the print() statements I get:
<Range object Invoice!A1:Q42>
[['2019-001', '01/01/2019', 'Services']...] #cut for brevity
What is the best way to get that "A1:Q42" value? I really only want the end of the range (Q42), because I need to build the get_range_from_a1() argument "A4:Q14". My sheet has known headers (rows 1-3), and the get_values() includes 3 rows that I don't want in the get_values() list.
I guess I could do some string manipulation to pull out the text between the ":" and ">" in
<Range object Invoice!A1:Q42>
...but that seems a bit sloppy.
As a quick aside, it would be fantastic to be able to call get_data_range() like so:
invoice_sheet = spreadsheet.get_sheet_by_name('Invoice')
invoice_data_range = invoice_sheet.get_data_range(start="A4", end="")
invoice_values = invoice_data_range.get_values()
...but that's more like a feature request. (Which I'm happy to do BTW).
Author here. Alan answers it well.
I added some methods at Range level to the library, that are simply shortcuts to the coordinates properties.
from sheetfu import SpreadsheetApp
spreadsheet = SpreadsheetApp("....access_file.json").open_by_id('long_string_id')
sheet = spreadsheet.get_sheet_by_name('test')
data_range = sheet.get_data_range()
starting_row = data_range.get_row()
starting_column = data_range.get_column()
max_row = data_range.get_max_row()
max_column = data_range.get_max_column()
This will effectively tell you the max row and max column that contains data in your sheet.
If you use the get_data_range method, the first row and first column typically is 1.
I received a response from the owner of Sheetfu, and the following code provides the information that I'm looking for.
Example code:
from sheetfu import SpreadsheetApp
spreadsheet = SpreadsheetApp("....access_file.json").open_by_id('long_string_id')
sheet = spreadsheet.get_sheet_by_name('test')
data_range = sheet.get_data_range()
range_max_row = data_range.coordinates.row + data_range.coordinates.number_of_rows - 1
range_max_column = data_range.coordinates.column + data_range.coordinates.number_of_columns - 1
As of this writing, the .coordinates properties are not currently documented, but they are usable, and should be officially documented within the next couple of weeks.

How do I read an HTML file in Python from multiple URLs?

I'm writing a script that will pull data from a basic HTML page based on the following:
The first parameter in the URL floats between -90.0 and 90.0 (inclusive) and the second set of numbers are between -180.0 and 180.0 (inclusive). The URL will direct you to one page with a single number as the body of the page (for example, http://jawbone-virality.herokuapp.com/scanner/desert/-89.7/131.56/). I need to find the largest virality number between all of the pages attached to the URL.
So, right now I have it printing the first and second number, as well as the number in the body (we call it virality). It's only printing to the console, every time I try writing it to a file it spazzes on me and I get errors. Any hints or anything I'm missing? I'm very new to Python so I'm not sure if I'm missing something or anything.
import shutil
import os
import time
import datetime
import math
import urllib
from array import array
myFile = open('test.html','w')
m = 5
for x in range(-900,900,1):
for y in range(-1800,1800,1):
filehandle = urllib.urlopen('http://jawbone-virality.herokuapp.com/scanner/desert/'+str(x/10)+'/'+str(y/10)+'/')
print 'Planet Desert: (' + str(x/10) +','+ str(y/10) + '), Virality: ' + filehandle.readlines()[0] #lines
#myFile.write('Planet Desert: (' + str(x/10) +','+ str(y/10) + '), Virality: ' + filehandle.readlines()[0])
myFile.close()
filehandle.close()
Thank you!
When writing to the file, do you still have the print statement before? Then your problem would be that Python advances the file pointer to the end of the file when you call readlines(). The second call to readlines() will thus return an empty list and your access to the first element results in an IndexError.
See this example execution:
filehandle = urllib.urlopen('http://jawbone-virality.herokuapp.com/scanner/desert/0/0/')
print(filehandle.readlines()) # prints ['5']
print(filehandle.readlines()) # prints []
The solution is to save the result into a variable and then use it.
filehandle = urllib.urlopen('http://jawbone-virality.herokuapp.com/scanner/desert/0/0/')
res = filehandle.readlines()[0]
print(res) # prints 5
print(res) # prints 5
Yet, as already pointed out in the comments, calling readlines() here is not needed, because as it seems the format of the website is only a pure integer. So the concept of lines does not really exist there or does at least not provide any more information. So let's drop it in exchange for a easier function read() (doesn't even need readline() here).
filehandle = urllib.urlopen('http://jawbone-virality.herokuapp.com/scanner/desert/0/0/')
res = filehandle.read()
print(res) # prints 5
There's still another problem in your sourcecode. From your usage of urllib.urlopen() I can derive, you are using Python 2. However, in Python 2 divisions of integers are handled like in C or Java, they result in an integer rounded to floor. Thus, you will call http://jawbone-virality.herokuapp.com/scanner/desert/-90/-180/ ten times.
This can be fixed by either:
from __future__ import division
str(x / 10.0) and str(y / 10.0)
switching to Python 3 and using urllib2
Hopefully, I could help.

Data analysis for inconsistent string formatting

I have this task that I've been working on, but am having extreme misgivings about my methodology.
So the problem is that I have a ton of excel files that are formatted strangely (and not consistently) and I need to extract certain fields for each entry. An example data set is
My original approach was this:
Export to csv
Separate into counties
Separate into districts
Analyze each district individually, pull out values
write to output.csv
The problem I've run into is that the format (seemingly well organized) is almost random across files. Each line contains the same fields, but in a different order, spacing, and wording. I wrote a script to correctly process one file, but it doesn't work on any other files.
So my question is, is there a more robust method of approaching this problem rather than simple string processing? What I had in mind was more of a fuzzy logic approach for trying to pin which field an item was, which could handle the inputs being a little arbitrary. How would you approach this problem?
If it helps clear up the problem, here is the script I wrote:
# This file takes a tax CSV file as input
# and separates it into counties
# then appends each county's entries onto
# the end of the master out.csv
# which will contain everything including
# taxes, bonds, etc from all years
#import the data csv
import sys
import re
import csv
def cleancommas(x):
toggle=False
for i,j in enumerate(x):
if j=="\"":
toggle=not toggle
if toggle==True:
if j==",":
x=x[:i]+" "+x[i+1:]
return x
def districtatize(x):
#list indexes of entries starting with "for" or "to" of length >5
indices=[1]
for i,j in enumerate(x):
if len(j)>2:
if j[:2]=="to":
indices.append(i)
if len(j)>3:
if j[:3]==" to" or j[:3]=="for":
indices.append(i)
if len(j)>5:
if j[:5]==" \"for" or j[:5]==" \'for":
indices.append(i)
if len(j)>4:
if j[:4]==" \"to" or j[:4]==" \'to" or j[:4]==" for":
indices.append(i)
if len(indices)==1:
return [x[0],x[1:len(x)-1]]
new=[x[0],x[1:indices[1]+1]]
z=1
while z<len(indices)-1:
new.append(x[indices[z]+1:indices[z+1]+1])
z+=1
return new
#should return a list of lists. First entry will be county
#each successive element in list will be list by district
def splitforstos(string):
for itemind,item in enumerate(string): # take all exception cases that didn't get processed
splitfor=re.split('(?<=\d)\s\s(?=for)',item) # correctly and split them up so that the for begins
splitto=re.split('(?<=\d)\s\s(?=to)',item) # a cell
if len(splitfor)>1:
print "\n\n\nfor detected\n\n"
string.remove(item)
string.insert(itemind,splitfor[0])
string.insert(itemind+1,splitfor[1])
elif len(splitto)>1:
print "\n\n\nto detected\n\n"
string.remove(item)
string.insert(itemind,splitto[0])
string.insert(itemind+1,splitto[1])
def analyze(x):
#input should be a string of content
#target values are nomills,levytype,term,yearcom,yeardue
clean=cleancommas(x)
countylist=clean.split(',')
emptystrip=filter(lambda a: a != '',countylist)
empt2strip=filter(lambda a: a != ' ', emptystrip)
singstrip=filter(lambda a: a != '\' \'',empt2strip)
quotestrip=filter(lambda a: a !='\" \"',singstrip)
splitforstos(quotestrip)
distd=districtatize(quotestrip)
print '\n\ndistrictized\n\n',distd
county = distd[0]
for x in distd[1:]:
if len(x)>8:
district=x[0]
vote1=x[1]
votemil=x[2]
spaceindex=[m.start() for m in re.finditer(' ', votemil)][-1]
vote2=votemil[:spaceindex]
mills=votemil[spaceindex+1:]
votetype=x[4]
numyears=x[6]
yearcom=x[8]
yeardue=x[10]
reason=x[11]
data = [filename,county,district, vote1, vote2, mills, votetype, numyears, yearcom, yeardue, reason]
print "data",data
else:
print "x\n\n",x
district=x[0]
vote1=x[1]
votemil=x[2]
spaceindex=[m.start() for m in re.finditer(' ', votemil)][-1]
vote2=votemil[:spaceindex]
mills=votemil[spaceindex+1:]
votetype=x[4]
special=x[5]
splitspec=special.split(' ')
try:
forind=[i for i,j in enumerate(splitspec) if j=='for'][0]
numyears=splitspec[forind+1]
yearcom=splitspec[forind+6]
except:
forind=[i for i,j in enumerate(splitspec) if j=='commencing'][0]
numyears=None
yearcom=splitspec[forind+2]
yeardue=str(x[6])[-4:]
reason=x[7]
data = [filename,county,district,vote1,vote2,mills,votetype,numyears,yearcom,yeardue,reason]
print "data other", data
openfile=csv.writer(open('out.csv','a'),delimiter=',', quotechar='|',quoting=csv.QUOTE_MINIMAL)
openfile.writerow(data)
# call the file like so: python tax.py 2007May8Tax.csv
filename = sys.argv[1] #the file is the first argument
f=open(filename,'r')
contents=f.read() #entire csv as string
#find index of every instance of the word county
separators=[m.start() for m in re.finditer('\w+\sCOUNTY',contents)] #alternative implementation in regex
# split contents into sections by county
# analyze each section and append to out.csv
for x,y in enumerate(separators):
try:
data = contents[y:separators[x+1]]
except:
data = contents[y:]
analyze(data)
is there a more robust method of approaching this problem rather than simple string processing?
Not really.
What I had in mind was more of a fuzzy logic approach for trying to pin which field an item was, which could handle the inputs being a little arbitrary. How would you approach this problem?
After a ton of analysis and programming, it won't be significantly better than what you've got.
Reading stuff prepared by people requires -- sadly -- people-like brains.
You can mess with NLTK to try and do a better job, but it doesn't work out terribly well either.
You don't need a radically new approach. You need to streamline the approach you have.
For example.
district=x[0]
vote1=x[1]
votemil=x[2]
spaceindex=[m.start() for m in re.finditer(' ', votemil)][-1]
vote2=votemil[:spaceindex]
mills=votemil[spaceindex+1:]
votetype=x[4]
numyears=x[6]
yearcom=x[8]
yeardue=x[10]
reason=x[11]
data = [filename,county,district, vote1, vote2, mills, votetype, numyears, yearcom, yeardue, reason]
print "data",data
Might be improved by using a named tuple.
Then build something like this.
data = SomeSensibleName(
district= x[0],
vote1=x[1], ... etc.
)
So that you're not creating a lot of intermediate (and largely uninformative) loose variables.
Also, keep looking at your analyze function (and any other function) to pull out the various "pattern matching" rules. The idea is that you'll examine a county's data, step through a bunch of functions until one matches the pattern; this will also create the named tuple. You want something like this.
for p in ( some, list, of, functions ):
match= p(data)
if match:
return match
Each function either returns a named tuple (because it liked the row) or None (because it didn't like the row).

Loop on Select_Analysis tool (Python and ArcGIS 9.3)

First, I'm new in Python and I work on Arc GIS 9.3.
I'd like to realize a loop on the "Select_Analysis" tool. Indeed I have a layer "stations" composed of all the bus stations of a city.
The layer has a field "rte_id" that explains on what line a station is located.
And I'd like to save in distinct layers all the stations with "rte_id" = 1, the stations with "rte_id" = 2 and so on. Hence the use of the tool select_analysis.
So, I decided to make a loop (I have 70 different "rte_id" .... so 70 different layers to create!). But it does not work and I'm totally lost!
Here is my code:
import arcgisscripting, os, sys, string
gp = arcgisscripting.create(9.3)
gp.AddToolbox("C:/Program Files (x86)/ArcGIS/ArcToolbox/Toolboxes/Data Management Tools.tbx")
stations = "d:/Travaux/NantesMetropole/Traitements/SIG/stations.shp"
field = "rte_id"
for i in field:
gp.Select_Analysis (stations, "d:/Travaux/NantesMetropole/Traitements/SIG/stations_" + i + ".shp", field + "=" + i)
i = i+1
print "ok"
And here is the error message:
gp.Select_Analysis (stations, "d:/Travaux/NantesMetropole/Traitements/SIG/stations_" + i + ".shp", field + "=" + i)
TypeError: can only concatenate list (not "str") to list
Have you got any ideas to solve my problem?
Thanks in advance!
Julien
The main problem here is in the string
for i in field:
You are trying to iterate a string - field name ("rte_id").
This is not correct.
You need to iterate all possible values of field "rte_id".
Easiest solution:
if you know that field "rte_id" have values 1 - 70 (for example) then you can try
for i in range(1, 71):
shp_name = "d:/Travaux/NantesMetropole/Traitements/SIG/stations_" + str(i) + ".shp"
expression = '{0} = {1}'.format(field, i)
gp.Select_Analysis (stations, shp_name , expression)
print "ok"
More sophisticated solution:
You need to get a list of all unique values of field "rte_id" in terms of SQL - to perform GROUP BY.
I think it is not actually possible to perform GROUP BY operation on SHP files with one tool.
You can use SearchCursor, iterate through all features and generate a list of unique values of you field. But this is more complex task.
Another way is to use the Summarize option on the shapefile table in ArcMap (open table, right click on the column header). You will get dbf table with unique values which you can read in your script.
I hope it will help you to start!
Don't have ArcGIS right now and can't write and check any script.
You will need to make substantial changes to this code in order to get it to do what you want. You may just want to download the Split Layer By Attribute Code from ArcGIS online which does the exact same thing.

Categories