I am trying to parse a string as below using PyParsing.
R1# show ip bgp
BGP table version is 2, local router ID is 1.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
> 10.1.1.0/24 192.168.1.2 0 0 200 i
Note that LocPrf value is empty but it can be a number.
ipField = Word(nums, max=3)
ipAddr = Combine(ipField + "." + ipField + "." + ipField + "." + ipField)
status_code = Combine(Optional(oneOf("s d h * r")) + ">" + Optional(Literal("i"))
prefix = Combine(ipAddr + Optional(Literal("/") + Word(nums,max=2)))
next_hop = ipAddr
med = Word(nums)
local_pref = Word(nums) | White()
path = Group(OneOrMore(Word(nums)))
origin = oneOf("i e ?")
This is the grammar.
g = status_code + prefix + next_hop + med + local_pref + Suppress(Word(nums)) + Optional(path) + origin
I just need to parse the Bold line. But this is not parsing it properly. It assigns Weight value to LocPrf.
Please look over the following code example (which I will include in the next pyparsing release). You should be able to adapt it to your application:
from pyparsing import col,Word,Optional,alphas,nums,ParseException
table = """\
12345678901234567890
COLOR S M L
RED 10 2 2
BLUE 5 10
GREEN 3 5
PURPLE 8"""
# function to create column-specific parse actions
def mustMatchCols(startloc,endloc):
def pa(s,l,t):
if not startloc <= col(l,s) <= endloc:
raise ParseException(s,l,"text not in expected columns")
return pa
# helper to define values in a space-delimited table
def tableValue(expr, colstart, colend):
return Optional(expr.copy().addParseAction(mustMatchCols(colstart,colend)))
# define the grammar for this simple table
colorname = Word(alphas)
integer = Word(nums).setParseAction(lambda t: int(t[0])).setName("integer")
row = (colorname("name") +
tableValue(integer, 11, 12)("S") +
tableValue(integer, 15, 16)("M") +
tableValue(integer, 19, 20)("L"))
# parse the sample text - skip over the header and counter lines
for line in table.splitlines()[2:]:
print
print line
print row.parseString(line).dump()
Related
I am trying to get buy and sell orders from binance api(python-binance) which has a limit of 500 values.(500 ask,500 buy). I already can get this with creating 500 variables with index numbers but it seems to me there has to be a better way than to write 500 lines of code.
This is the code I am trying to make it happen.
#!/usr/bin/python
# -*- coding: utf-8 -*-
from binance.client import Client
user_key = ''
secret_key = ''
binance_client = Client(user_key, secret_key)
while True:
alis = binance_client.futures_order_book(symbol='XRPUSDT')
binance_buy = alis['bids'][0]
binance_buy1 = alis['bids'][1]
binance_buy2 = alis['bids'][2]
binance_buy3 = alis['bids'][3]
binance_buy4 = alis['bids'][4]
binance_buy5 = alis['bids'][5]
binance_buy6 = alis['bids'][6]
binance_buy7 = alis['bids'][7]
binance_buy8 = alis['bids'][8]
binance_buy9 = alis['bids'][9]
binance_buy10 = alis['bids'][10]
binance_buy11 = alis['bids'][11]
binance_buy12 = alis['bids'][12]
binance_buy13 = alis['bids'][13]
binance_buy14 = alis['bids'][14]
binance_buy15 = alis['bids'][15]
binance_buy16 = alis['bids'][16]
binance_buy17 = alis['bids'][17]
binance_buy18 = alis['bids'][18]
binance_buy19 = alis['bids'][19]
binance_buy20 = alis['bids'][20]
binance_sell = alis['asks'][0]
binance_sell1 = alis['asks'][1]
binance_sell2 = alis['asks'][2]
binance_sell3 = alis['asks'][3]
binance_sell4 = alis['asks'][4]
binance_sell5 = alis['asks'][5]
binance_sell6 = alis['asks'][6]
binance_sell7 = alis['asks'][7]
binance_sell8 = alis['asks'][8]
binance_sell9 = alis['asks'][9]
binance_sell10 = alis['asks'][10]
binance_sell11 = alis['asks'][11]
binance_sell12 = alis['asks'][12]
binance_sell13 = alis['asks'][13]
binance_sell14 = alis['asks'][14]
binance_sell15 = alis['asks'][15]
binance_sell16 = alis['asks'][16]
binance_sell17 = alis['asks'][17]
binance_sell18 = alis['asks'][18]
binance_sell19 = alis['asks'][19]
binance_sell20 = alis['asks'][20]
binance_buy_demand = float(binance_buy[1]) + float(binance_buy1[1]) \
+ float(binance_buy2[1]) + float(binance_buy3[1]) \
+ float(binance_buy4[1]) + float(binance_buy5[1]) \
+ float(binance_buy6[1]) + float(binance_buy7[1]) \
+ float(binance_buy8[1]) + float(binance_buy9[1]) \
+ float(binance_buy10[1]) + float(binance_buy11[1]) \
+ float(binance_buy12[1]) + float(binance_buy13[1]) \
+ float(binance_buy14[1]) + float(binance_buy15[1]) \
+ float(binance_buy16[1]) + float(binance_buy17[1]) \
+ float(binance_buy18[1]) + float(binance_buy19[1]) \
+ float(binance_buy20[1])
for i in range(0, 500):
print (alis['asks'][i][0], alis['asks'][i][1])
binance_sell_demand = float(binance_sell[1]) \
+ float(binance_sell1[1]) + float(binance_sell2[1]) \
+ float(binance_sell3[1]) + float(binance_sell4[1]) \
+ float(binance_sell5[1]) + float(binance_sell6[1]) \
+ float(binance_sell7[1]) + float(binance_sell8[1]) \
+ float(binance_sell9[1]) + float(binance_sell10[1]) \
+ float(binance_sell11[1]) + float(binance_sell12[1]) \
+ float(binance_sell13[1]) + float(binance_sell14[1]) \
+ float(binance_sell15[1]) + float(binance_sell16[1]) \
+ float(binance_sell17[1]) + float(binance_sell18[1]) \
+ float(binance_sell19[1]) + float(binance_sell20[1])
there is 500 bids and 500 asks
after getting data I sum bids ands asks like this
I tried to make for loop but only could print this values cant sum it in for loop this is the code I tried:
sample output:
0.9315 18328.6
0.9316 18201.2
0.9317 23544.0
0.9318 260.4
0.9319 689.5
0.9320 20410.5
0.9321 47.7
0.9322 294.2
0.9323 446.6
0.9324 104.0
0.9325 3802.3
0.9326 100.1
0.9327 20122.9
0.9328 1410.0
0.9329 7745.1
0.9330 9094.4
0.9331 10389.9
0.9332 248.5
0.9333 71559.7
0.9334 18024.1
0.9335 7404.5
0.9336 1366.6
0.9337 21972.4
0.9338 1224.8
0.9339 49.9
0.9340 17590.5
0.9341 17967.1
0.9342 272.3
0.9343 704.4
0.9344 3581.7
0.9345 3896.8
the first items are price second is quantity
I am trying to make a function that sum and divide avg.price and also want to know how many bids total and asks total.
Use the sum() function with a generator that gets the appropriate element of each list item.
binance_buy_demand = sum(float(x[1]) for x in alis['bids'])
binance_sell_demand = sum(float(x[1]) for x in alis['asks'])
You can sum in a for loop like this:
total_ask_volume = 0
total_bid_volume = 0
for i in range(0, 500):
total_ask_volume += float(alis["asks"][i][1])
total_bid_volume += float(alis["bids"][i][1])
print(total_ask_volume, total_bid_volume)
Another option is to skip the i index, and go through the values directly:
total_ask_volume = 0
for ask in alis["asks"]:
total_ask_volume += float(ask[1])
total_bid_volume = 0
for bid in alis["bids"]:
total_bid_volume += float(bid[1])
print(total_ask_volume, total_bid_volume)
This can sometimes be clearer, particularly in situations where you'd otherwise have multiple indexes (i, j, k and so on) which could get confusing.
As a side note, float often rounds in unintuitive ways; it probably doesn't matter in this particular circumstance, but in most situations involving money you probably want to use Decimal instead (or multiply by 100 and count whole numbers of cents).
I'm very new to coding and I'm trying (and failing) to make a code that prints a Christmas-tree shape of "*" in whatever size instructed, and don't know where I'm going wrong, but I keep getting a variety of syntax errors all over my code.
size_raw = input("Size of tree?")
def spaces(length, size):
Space = " "
new_len = size / 2
new_len -= length
space_s = int(Space) * int(new_len)
return space_s
def segment(h, tw, bw, s):
line = tw
star = "*"
while line <= bw:
stars = line * star
print (spaces(int(line), int(s)) + (stars) + (spaces(int(line), int(s))
line += 2
def tree(size):
Topwidth = 1
height = 3
while Topwidth <= size:
bottom_width = Topwidth + height
segment(int(height), int(Topwidth), int(bottom_width), int(size)
height += 2
if size_raw == "Very Big":
tree(100)
elif size_raw == "Massive":
tree(1000)
else:
tree(int(size_raw))
I expect it to work smoothly in this current state, however it responds with an error of a new for in a new location with each new attempt.
Your parenthesis are not closed properly.
Your code
print (spaces(int(line), int(s)) + (stars) + (spaces(int(line), int(s))
segment(int(height), int(Topwidth), int(bottom_width), int(size)
My version
print (spaces(int(line), int(s))) + (stars) + (spaces(int(line), int(s)))
segment(int(height), int(Topwidth), int(bottom_width), int(size))
Got it to work with this fiddle, where i also change the indention to be with 4 spaces which is the most common indention.
I pulled a Python script off of Github which is intended to analyze & rank stocks. I finally got it running but unfortunately the EV/EBITDA and Shareholder Yield are populating their default values, 1000 & 0 respectively.
I've spent the last few days attempting to troubleshoot, learning a lot in the process, but unfortunately had no luck.. I think it's attempting to extract data from a nonexistent line on the 'Scraper' portion or referencing an incorrect HTML. I'll paste the two code snips I think the error may lie within though the rest of the files are linked above.
Main File
from sys import stdout
from Stock import Stock
import Pickler
import Scraper
import Rankings
import Fixer
import Writer
# HTML error code handler - importing data is a chore, and getting a connection
# error halfway through is horribly demotivating. Use a pickler to serialize
# imported data into a hot-startable database.
pklFileName = 'tmpstocks.pkl'
pickler = Pickler.Pickler()
# Check if a pickled file exists. Load it if the user requests. If no file
# loaded, stocks is an empty list.
stocks = pickler.loadPickledFile(pklFileName)
# Scrape data from FINVIZ. Certain presets have been established (see direct
# link for more details)
url = 'http://finviz.com/screener.ashx?v=152&f=cap_smallover&' + \
'ft=4&c=0,1,2,6,7,10,11,13,14,45,65'
html = Scraper.importHtml(url)
# Parse the HTML for the number of pages from which we'll pull data
nPages = -1
for line in html:
if line[0:40] == '<option selected="selected" value=1>Page':
# Find indices
b1 = line.index('/') + 1
b2 = b1 + line[b1:].index('<')
# Number of pages containing stock data
nPages = int(line[b1:b2])
break
# Parse data from table on the first page of stocks and store in the database,
# but only if no data was pickled
if pickler.source == Pickler.PickleSource.NOPICKLE:
Scraper.importFinvizPage(html, stocks)
# The first page of stocks (20 stocks) has been imported. Now import the
# rest of them
source = Pickler.PickleSource.FINVIZ
iS = pickler.getIndex(source, 1, nPages + 1)
for i in range(iS, nPages + 1):
try:
# Print dynamic progress message
print('Importing FINVIZ metrics from page ' + str(i) + ' of ' + \
str(nPages) + '...', file=stdout, flush=True)
# Scrape data as before
url = 'http://finviz.com/screener.ashx?v=152&f=cap_smallover&ft=4&r=' + \
str(i*20+1) + '&c=0,1,2,6,7,10,11,13,14,45,65'
html = Scraper.importHtml(url)
# Import stock metrics from page into a buffer
bufferList = []
Scraper.importFinvizPage(html, bufferList)
# If no errors encountered, extend buffer to stocks list
stocks.extend(bufferList)
except:
# Error encountered. Pickle stocks for later loading
pickler.setError(source, i, stocks)
break
# FINVIZ stock metrics successfully imported
print('\n')
# Store number of stocks in list
nStocks = len(stocks)
# Handle pickle file
source = Pickler.PickleSource.YHOOEV
iS = pickler.getIndex(source, 0, nStocks)
# Grab EV/EBITDA metrics from Yahoo! Finance
for i in range(iS, nStocks):
try:
# Print dynamic progress message
print('Importing Key Statistics for ' + stocks[i].tick +
' (' + str(i) + '/' + str(nStocks - 1) + ') from Yahoo! Finance...', \
file=stdout, flush=True)
# Scrape data from Yahoo! Finance
url = 'http://finance.yahoo.com/q/ks?s=' + stocks[i].tick + '+Key+Statistics'
html = Scraper.importHtml(url)
# Parse data
for line in html:
# Check no value
if 'There is no Key Statistics' in line or \
'Get Quotes Results for' in line or \
'Changed Ticker Symbol' in line or \
'</html>' in line:
# Non-financial file (e.g. mutual fund) or
# Ticker not located or
# End of html page
stocks[i].evebitda = 1000
break
elif 'Enterprise Value/EBITDA' in line:
# Line contains EV/EBITDA data
evebitda = Scraper.readYahooEVEBITDA(line)
stocks[i].evebitda = evebitda
break
except:
# Error encountered. Pickle stocks for later loading
pickler.setError(source, i, stocks)
break
# Yahoo! Finance EV/EBITDA successfully imported
print('\n')
# Handle pickle file
source = Pickler.PickleSource.YHOOBBY
iS = pickler.getIndex(source, 0, nStocks)
# Grab BBY metrics from Yahoo! Finance
for i in range(iS, nStocks):
try:
# Print dynamic progress message
print('Importing Cash Flow for ' + stocks[i].tick +
' (' + str(i) + '/' + str(nStocks - 1) + ') from Yahoo! Finance...', \
file=stdout, flush=True)
# Scrape data from Yahoo! Finance
url = 'http://finance.yahoo.com/q/cf?s=' + stocks[i].tick + '&ql=1'
html = Scraper.importHtml(url)
# Parse data
totalBuysAndSells = 0
for line in html:
# Check no value
if 'There is no Cash Flow' in line or \
'Get Quotes Results for' in line or \
'Changed Ticker Symbol' in line or \
'</html>' in line:
# Non-financial file (e.g. mutual fund) or
# Ticker not located or
# End of html page
break
elif 'Sale Purchase of Stock' in line:
# Line contains Sale/Purchase of Stock information
totalBuysAndSells = Scraper.readYahooBBY(line)
break
# Calculate BBY as a percentage of current market cap
bby = round(-totalBuysAndSells / stocks[i].mktcap * 100, 2)
stocks[i].bby = bby
except:
# Error encountered. Pickle stocks for later loading
pickler.setError(source, i, stocks)
break
# Yahoo! Finance BBY successfully imported
if not pickler.hasErrorOccurred:
# All data imported
print('\n')
print('Fixing screener errors...')
# A number of stocks may have broken metrics. Fix these (i.e. assign out-of-
# bounds values) before sorting
stocks = Fixer.fixBrokenMetrics(stocks)
print('Ranking stocks...')
# Calculate shareholder Yield
for i in range(nStocks):
stocks[i].shy = stocks[i].div + stocks[i].bby
# Time to rank! Lowest value gets 100
rankPE = 100 * (1 - Rankings.rankByValue([o.pe for o in stocks]) / nStocks)
rankPS = 100 * (1 - Rankings.rankByValue([o.ps for o in stocks]) / nStocks)
rankPB = 100 * (1 - Rankings.rankByValue([o.pb for o in stocks]) / nStocks)
rankPFCF = 100 * (1 - Rankings.rankByValue([o.pfcf for o in stocks]) / nStocks)
rankEVEBITDA = 100 * (1 - Rankings.rankByValue([o.evebitda for o in stocks]) / nStocks)
# Shareholder yield ranked with highest getting 100
rankSHY = 100 * (Rankings.rankByValue([o.shy for o in stocks]) / nStocks)
# Rank total stock valuation
rankStock = rankPE + rankPS + rankPB + rankPFCF + rankEVEBITDA + rankSHY
# Rank 'em
rankOverall = Rankings.rankByValue(rankStock)
# Calculate Value Composite - higher the better
valueComposite = 100 * rankOverall / len(rankStock)
# Reverse indices - lower index -> better score
rankOverall = [len(rankStock) - 1 - x for x in rankOverall]
# Assign to stocks
for i in range(nStocks):
stocks[i].rank = rankOverall[i]
stocks[i].vc = round(valueComposite[i], 2)
print('Sorting stocks...')
# Sort all stocks by normalized rank
stocks = [x for (y, x) in sorted(zip(rankOverall, stocks))]
# Sort top decile by momentum factor. O'Shaughnessey historically uses 25
# stocks to hold. The top decile is printed, and the user may select the top 25
# (or any n) from the .csv file.
dec = int(nStocks / 10)
topDecile = []
# Store temporary momentums from top decile for sorting reasons
moms = [o.mom for o in stocks[:dec]]
# Sort top decile by momentum
for i in range(dec):
# Get index of top momentum performer in top decile
topMomInd = moms.index(max(moms))
# Sort
topDecile.append(stocks[topMomInd])
# Remove top momentum performer from further consideration
moms[topMomInd] = -100
print('Saving stocks...')
# Save momentum-weighted top decile
topCsvPath = 'top.csv'
Writer.writeCSV(topCsvPath, topDecile)
# Save results to .csv
allCsvPath = 'stocks.csv'
Writer.writeCSV(allCsvPath, stocks)
print('\n')
print('Complete.')
print('Top decile (sorted by momentum) saved to: ' + topCsvPath)
print('All stocks (sorted by trending value) saved to: ' + allCsvPath)
Scraper
import re
from urllib.request import urlopen
from Stock import Stock
def importHtml(url):
"Scrapes the HTML file from the given URL and returns line break delimited \
strings"
response = urlopen(url, data = None)
html = response.read().decode('utf-8').split('\n')
return html
def importFinvizPage(html, stocks):
"Imports data from a FINVIZ HTML page and stores in the list of Stock \
objects"
isFound = False
for line in html:
if line[0:15] == '<td height="10"':
isFound = True
# Import data line into stock database
_readFinvizLine(line, stocks)
if isFound and len(line) < 10:
break
return
def _readFinvizLine(line, stocks):
"Imports stock metrics from the data line and stores it in the list of \
Stock objects"
# Parse html
(stkraw, dl) = _parseHtml(line)
# Create new stock object
stock = Stock()
# Get ticker symbol
stock.tick = stkraw[dl[1] + 1: dl[2]]
# Get company name
stock.name = stkraw[dl[2] + 1 : dl[3]]
# Get market cap multiplier (either MM or BB)
if stkraw[dl[4] - 1] == 'B':
capmult = 1000000000
else:
capmult = 1000000
# Get market cap
stock.mktcap = capmult * _toFloat(stkraw[dl[3] + 1 : dl[4] - 1])
# Get P/E ratio
stock.pe = _toFloat(stkraw[dl[4] + 1 : dl[5]])
# Get P/S ratio
stock.ps = _toFloat(stkraw[dl[5] + 1 : dl[6]])
# Get P/B ratio
stock.pb = _toFloat(stkraw[dl[6] + 1 : dl[7]])
# Get P/FCF ratio
stock.pfcf = _toFloat(stkraw[dl[7] + 1 : dl[8]])
# Get Dividend Yield
stock.div = _toFloat(stkraw[dl[8] + 1 : dl[9] - 1])
# Get 6-mo Relative Price Strength
stock.mom = _toFloat(stkraw[dl[9] + 1 : dl[10] - 1])
# Get Current Stock Price
stock.price = _toFloat(stkraw[dl[11] + 1 : dl[12]])
# Append stock to list of stocks
stocks.append(stock)
return
def _toFloat(line):
"Converts a string to a float. Returns NaN if the line can't be converted"
try:
num = float(line)
except:
num = float('NaN')
return num
def readYahooEVEBITDA(line):
"Returns EV/EBITDA data from Yahoo! Finance HTML line"
# Parse html
(stkraw, dl) = _parseHtml(line)
for i in range(0, len(dl)):
if (stkraw[dl[i] + 1 : dl[i] + 24] == 'Enterprise Value/EBITDA'):
evebitda = stkraw[dl[i + 1] + 1 : dl[i + 2]]
break
return _toFloat(evebitda)
def readYahooBBY(line):
"Returns total buys and sells from Yahoo! Finance HTML line. Result will \
still need to be divided by market cap"
# Line also contains Borrowings details - Remove it all
if 'Net Borrowings' in line:
# Remove extra data
line = line[:line.find('Net Borrowings')]
# Trim prior data
line = line[line.find('Sale Purchase of Stock'):]
# Determine if buys or sells, replace open parantheses:
# (#,###) -> -#,###
line = re.sub(r'[(]', '-', line)
# Eliminate commas and close parantheses: -#,### -> -####
line = re.sub(r'[,|)]', '', line)
# Remove HTML data and markup, replacing with commas
line = re.sub(r'[<.*?>|]', ',', line)
line = re.sub(' ', ',', line)
# Locate the beginnings of each quarterly Sale Purchase points
starts = [m.start() for m in re.finditer(',\d+,|,.\d+', line)]
# Locate the ends of each quarterly Sale Purchase points
ends = [m.start() for m in re.finditer('\d,', line)]
# Sum all buys and sells across year
tot = 0
for i in range(0, len(starts)):
# x1000 because all numbers are in thousands
tot = tot + float(line[starts[i] + 1 : ends[i] + 1]) * 1000
return tot
def _parseHtml(line):
"Parses the HTML line by </td> breaks and returns the delimited string"
# Replace </td> breaks with placeholder, '`'
ph = '`'
rem = re.sub('</td>', ph, line)
# The ticker symbol initial delimiter is different
# Remove all other remaining HTML data
stkraw = re.sub('<.*?>', '', rem)
# Replace unbalanced HTML
stkraw = re.sub('">', '`', stkraw)
# Find the placeholders
dl = [m.start() for m in re.finditer(ph, stkraw)]
return (stkraw, dl)
If anyone has any input or perhaps a better method such as beautifulsoup, I'd really appreciate it! I'm very open to any tutorials that would help as well. My intent is to both better my programming ability and have an effective stock screener.
I was having the same issue scraping the Yahoo data in Python, and in Matlab as well. As a workaround, I wrote a macro in VBA to grab all of the EV/EBITDA data from Yahoo by visiting each stock's Key Statistics page. However, it takes about a day to run on all 3,000+ stocks with market caps over $200M, which is not really practical.
I've tried finding the EV/EBITDA on various stock screeners online, but they either don't report it or only let you download a couple hundred stocks' data without paying. Busy Stock's screener seems the best in this regard, but their EV/EBITDA figures don't line up to Yahoo's, which worries me that they are using different methodology.
One solution and my recommendation to you is to use the Trending Value algorithm in Quantopian, which is free. You can find the code here: https://www.quantopian.com/posts/oshaugnessy-what-works-on-wall-street
Quantopian will let you backtest the algorithm to 2002, and live test it as well.
I wanna read a binary file (K7120127.SRS) with caractristhics detailed in the word file (Documentacion SRS DATA.doc) ,2.2. chapter, that adding in the next link
https://drive.google.com/folderview?id=0B_NlxFaQkpgHb00yTm5kU0MyaUU&usp=sharing
In the link is included a viewer of that data (Srsdisp.exe), but i wanna process this data not only view it, that's why I'd want to read it in Python.
I know plot using matplotlib, but work with binary files is new for me. I 'd wanna plot something like this (That plot was made using the viewer included in the link)
Try that.
from struct import unpack
# constants from the file spec
RECORD_SIZE=826
RECORD_HEADER_SIZE=24
RECORD_ARRAY_SIZE=401
# verbosity values
VERBOSITY_ALL = 2 # print warnings and errors
VERBOSITY_ERRORS = 1 # print errors
VERBOSITY_NONE = 0 # print nothing
class SRSRecord:
"""Holds one 826 byte SRS Record."""
_site_to_name = {
1: "Palehua",
2: "Holloman",
3: "Learmonth",
4: "San Vito",
# add new site names here ..
}
def __init__(self):
self.year = None
self.month = None
self.day = None
self.hour = None
self.minute = None
self.seconds = None
self.site_number = None
self.site_name = None
self.n_bands_per_record = None
self.a_start_freq = None
self.a_end_freq = None
self.a_num_bytes = None
self.a_analyser_reference_level = None
self.a_analyser_attenuation = None
self.b_start_freq = None
self.b_end_freq = None
self.b_num_bytes = None
self.b_analyser_reference_level = None
self.b_analyser_attenuation = None
# dictionary that maps frequency in mega hertz to level
self.a_values = {}
# dictionary that maps frequency in mega hertz to level
self.b_values = {}
return
def _parse_srs_file_header(self, header_bytes, verbosity = VERBOSITY_ALL):
fields = unpack(
# General header information
'>' # (data packed in big endian format)
'B' # 1 Year (last 2 digits) Byte integer (unsigned)
'B' # 2 Month number (1 to 12) "
'B' # 3 Day (1 to 31) "
'B' # 4 Hour (0 to 23 UT) "
'B' # 5 Minute (0 to 59) "
'B' # 6 Second at start of scan (0 to 59) "
'B' # 7 Site Number (0 to 255) "
'B' # 8 Number of bands in the record (2) "
# Band 1 (A-band) header information
'h' # 9,10 Start Frequency (MHz) Word integer (16 bits)
'H' # 11,12 End Frequency (MHz) "
'H' # 13,14 Number of bytes in data record (401) "
'B' # 15 Analyser reference level Byte integer
'B' # 16 Analyser attenuation (dB) "
# Band 2 (B-band) header information
# 17-24 As for band 1
'H' # 17,18 Start Frequency (MHz) Word integer (16 bits)
'H' # 19,20 End Frequency (MHz) "
'H' # 21,22 Number of bytes in data record (401) "
'B' # 23 Analyser reference level Byte integer
'B', # 24 Analyser attenuation (dB) "
header_bytes)
self.year = fields[0]
self.month = fields[1]
self.day = fields[2]
self.hour = fields[3]
self.minute = fields[4]
self.seconds = fields[5]
# read the site number and work out the site name
self.site_number = fields[6]
if self.site_number not in SRSRecord._site_to_name.keys():
# got an unknown site number.. complain a bit..
if verbosity >= VERBOSITY_ALL:
print("Unknown site number: %s" % self.site_number)
print("A list of known site numbers follows:")
for site_number, site_name in SRSRecord._site_to_name.items():
print("\t%s: %s" % (site_number, site_name))
# then set the site name to unknown.
self.site_name = "UnknownSite"
else:
# otherwise look up the site using our lookup table
self.site_name = SRSRecord._site_to_name[self.site_number]
# read the number of bands
self.n_bands_per_record = fields[7] # should be 2
if self.n_bands_per_record != 2 and verbosity >= VERBOSITY_ERRORS:
print("Warning.. record has %s bands, expecting 2!" % self.n_bands_per_record)
# read the a record meta data
self.a_start_freq = fields[8]
self.a_end_freq = fields[9]
self.a_num_bytes = fields[10]
if self.a_num_bytes != 401 and verbosity >= VERBOSITY_ERRORS:
print("Warning.. record has %s bytes in the a array, expecting 401!" %
self.a_num_bytes)
self.a_analyser_reference_level = fields[11]
self.a_analyser_attenuation = fields[12]
# read the b record meta data
self.b_start_freq = fields[13]
self.b_end_freq = fields[14]
self.b_num_bytes = fields[15]
if self.b_num_bytes != 401 and verbosity >= VERBOSITY_ERRORS:
print("Warning.. record has %s bytes in the b array, expecting 401!" %
self.b_num_bytes)
self.b_analyser_reference_level = fields[16]
self.b_analyser_attenuation = fields[17]
return
def _parse_srs_a_levels(self, a_bytes):
# unpack the frequency/levels from the first array
for i in range(401):
# freq equation from the srs file format spec
freq_a = 25 + 50 * i / 400.0
level_a = unpack('>B', a_bytes[i])[0]
self.a_values[freq_a] = level_a
return
def _parse_srs_b_levels(self, b_bytes):
for i in range(401):
# freq equation from the srs file format spec
freq_b = 75 + 105 * i / 400.0
level_b = unpack('>B', b_bytes[i])[0]
self.b_values[freq_b] = level_b
return
def __str__(self):
return ("%s/%s/%s, %s:%s:%s site: %s/%s bands: %s "
"[A %s->%s MHz ref_level: %s atten: %s dB], "
"[B %s->%s MHz ref_level: %s atten: %s dB]"
)% (
self.day, self.month, self.year,
self.hour, self.minute, self.seconds,
self.site_number, self.site_name,
self.n_bands_per_record,
self.a_start_freq, self.a_end_freq,
self.a_analyser_reference_level, self.a_analyser_attenuation,
self.b_start_freq, self.b_end_freq,
self.b_analyser_reference_level, self.b_analyser_attenuation,
)
def _dump(self, values):
freqs = values.keys()
freqs.sort()
for freq in freqs:
print "%5s %s" % (freq, values[freq])
return
def dump_a(self):
self._dump(self.a_values)
return
def dump_b(self):
self._dump(self.b_values)
return
def read_srs_file(fname):
"""Parses an srs file and returns a list of SRSRecords."""
# keep the records we read in here
srs_records = []
f = open(fname, "rb")
while True:
# read raw record data
record_data = f.read(RECORD_SIZE)
# if the length of the record data is zero we've reached the end of the data
if len(record_data) == 0:
break
# break up the record bytes into header, array a and array b bytes
header_bytes = record_data[:RECORD_HEADER_SIZE]
a_bytes = record_data[RECORD_HEADER_SIZE : RECORD_HEADER_SIZE + RECORD_ARRAY_SIZE]
b_bytes = record_data[RECORD_HEADER_SIZE + RECORD_ARRAY_SIZE :
RECORD_HEADER_SIZE + 2 * RECORD_ARRAY_SIZE]
# make a new srs record
record = SRSRecord()
record._parse_srs_file_header(header_bytes, verbosity = VERBOSITY_ERRORS)
record._parse_srs_a_levels(a_bytes)
record._parse_srs_b_levels(b_bytes)
srs_records.append(record)
return srs_records
if __name__ == "__main__":
# parse the file.. (this is where the magic happens ;)
srs_records = read_srs_file(fname = "K7120127.SRS")
# play with the data
for i in range(3):
print srs_records[i]
r0 = srs_records[0]
r0.dump_a()
r0.dump_b()
friends.
I have a 'make'-like style file needed to be parsed. The grammar is something like:
samtools=/path/to/samtools
picard=/path/to/picard
task1:
des: description
path: /path/to/task1
para: [$global.samtools,
$args.input,
$path
]
task2: task1
Where $global contains the variables defined in a global scope. $path is a 'local' variable. $args contains the key/pair values passed in by users.
I would like to parse this file by some python libraries. Better to return some parse tree. If there are some errors, better to report them. I found this one: CodeTalker and yeanpypa. Can they be used in this case? Any other recommendations?
I had to guess what your makefile structure allows based on your example, but this should get you close:
from pyparsing import *
# elements of the makefile are delimited by line, so we must
# define skippable whitespace to include just spaces and tabs
ParserElement.setDefaultWhitespaceChars(' \t')
NL = LineEnd().suppress()
EQ,COLON,LBRACK,RBRACK = map(Suppress, "=:[]")
identifier = Word(alphas+'_', alphanums)
symbol_assignment = Group(identifier("name") + EQ + empty +
restOfLine("value"))("symbol_assignment")
symbol_ref = Word("$",alphanums+"_.")
def only_column_one(s,l,t):
if col(l,s) != 1:
raise ParseException(s,l,"not in column 1")
# task identifiers have to start in column 1
task_identifier = identifier.copy().setParseAction(only_column_one)
task_description = "des:" + empty + restOfLine("des")
task_path = "path:" + empty + restOfLine("path")
task_para_body = delimitedList(symbol_ref)
task_para = "para:" + LBRACK + task_para_body("para") + RBRACK
task_para.ignore(NL)
task_definition = Group(task_identifier("target") + COLON +
Optional(delimitedList(identifier))("deps") + NL +
(
Optional(task_description + NL) &
Optional(task_path + NL) &
Optional(task_para + NL)
)
)("task_definition")
makefile_parser = ZeroOrMore(
symbol_assignment |
task_definition |
NL
)
if __name__ == "__main__":
test = """\
samtools=/path/to/samtools
picard=/path/to/picard
task1:
des: description
path: /path/to/task1
para: [$global.samtools,
$args.input,
$path
]
task2: task1
"""
# dump out what we parsed, including results names
for element in makefile_parser.parseString(test):
print element.getName()
print element.dump()
print
Prints:
symbol_assignment
['samtools', '/path/to/samtools']
- name: samtools
- value: /path/to/samtools
symbol_assignment
['picard', '/path/to/picard']
- name: picard
- value: /path/to/picard
task_definition
['task1', 'des:', 'description ', 'path:', '/path/to/task1 ', 'para:',
'$global.samtools', '$args.input', '$path']
- des: description
- para: ['$global.samtools', '$args.input', '$path']
- path: /path/to/task1
- target: task1
task_definition
['task2', 'task1']
- deps: ['task1']
- target: task2
The dump() output shows you what names you can use to get at the fields within the parsed elements, or to distinguish what kind of element you have. dump() is a handy, generic tool to output whatever pyparsing has parsed. Here is some code that is more specific to your particular parser, showing how to use the field names as either dotted object references (element.target, element.deps, element.name, etc.) or dict-style references (element[key]):
for element in makefile_parser.parseString(test):
if element.getName() == 'task_definition':
print "TASK:", element.target,
if element.deps:
print "DEPS:(" + ','.join(element.deps) + ")"
else:
print
for key in ('des', 'path', 'para'):
if key in element:
print " ", key.upper()+":", element[key]
elif element.getName() == 'symbol_assignment':
print "SYM:", element.name, "->", element.value
prints:
SYM: samtools -> /path/to/samtools
SYM: picard -> /path/to/picard
TASK: task1
DES: description
PATH: /path/to/task1
PARA: ['$global.samtools', '$args.input', '$path']
TASK: task2 DEPS:(task1)
I've used pyparsing in the past and been immensely pleased with it (q.v., the pyparsing project site).