Error with Entity matching package deepmatcher - python

Getting below Error:
ValueError Traceback (most recent call last)
<ipython-input-6-2d323ffe212f> in <module>()
----> 1 train, validation, test = dm.data.process(path='/content/', train='train.csv', validation='validation.csv', test='test.csv')
1 frames
/usr/local/lib/python3.7/dist-packages/deepmatcher/data/process.py in _check_header(header, id_attr, left_prefix, right_prefix, label_attr, ignore_columns)
32 if attr not in (id_attr, label_attr) and attr not in ignore_columns:
33 if not attr.startswith(left_prefix) and not attr.startswith(right_prefix):
---> 34 raise ValueError('Attribute ' + attr + ' is not a left or a right table '
35 'column, not a label or id and is not ignored. Not sure '
36 'what it is...')
ValueError: Attribute ltable_id is not a left or a right table column, not a label or id and is not ignored. Not sure what it is...
I am using the
http://pages.cs.wisc.edu/~anhai/data1/deepmatcher_data/Textual/Company/company_exp_data.zip
dataset for this learning, because previous test with my own dataset gave the same error.
Code:
import deepmatcher as dm
train, validation, test = dm.data.process(path='/content/', train='train.csv', validation='validation.csv', test='test.csv')
That's it. I am following below repo github.com/anhaidgroup/deepmatcher
Looking for better understanding and possible solve. Thanks in advance.

I didn't test it but error message may suggest that it needs special columns to work.
And first look at page with repo DeepMatch can confirm it.
There is example table which has columns with name Left ... and Right ....
There was also link to Get Started in which you can see
Step 1. Process labeled data¶
Left" attributes (required): ...
These column names are expected to be prefixed with "left_" by default.
"Right" attributes (required): "...
These column names are expected to be prefixed with "right_" by default.
Which shows that columns need prefix left_, right_ but your data has columns ltable_id, rtable_id. So you have to change column names after you load it and before you use it with DeepMatch
EDIT:
In repo is also link Data Processing
And there you can see
"Left" attributes (required): ...
This can be customized by setting the left_prefix parameter (e.g., use "ltable_" as the prefix).
"Right" attributes (required): ...
This can be customized by setting the right_prefix parameter (e.g., use "rtable_" as the prefix).
and it shows example code
dm.data.process(... left_prefix='left_', right_prefix='right_', ...)
which can means you can do
dm.data.process(... left_prefix='ltable_', right_prefix='rtable_', ...)
EDIT:
I tested it and it resolves this problem with company_exp_data.zip.
import deepmatcher as dm
train, validation, test = dm.data.process(
path='/content/',
#path='exp_data',
train='train.csv',
validation='valid.csv',
test='test.csv',
left_prefix='ltable_',
right_prefix='rtable_',
)
But next it gives other problem.
RuntimeError: Google drive link https://drive.google.com/uc?export=download&id=1Vih8gAmgBnuYDxfblbT94P6WjB7s1ZSh is currently unavailable, because the quota was exceeded.
It tries to read some data from Google Drive but the quota was exceeded.
Maybe it will need to download it manually and change source code to load it from local computer. But it is problem for new question. Or this problem should be send to author of this module. And he should put data on other server and change source code.
Summarizing: all your problem is you didn't read documentation.

Related

Why can't Bloomberg Api (blp) catch the data by correct field id?

I want to use python to catch the "ESG disclosure score " data in Bloomberg. It is my sample code :
from xbbg import blp
#1
blp.bdp(tickers='AAPL US Equity', flds=['Security_Name', 'GICS_Sector_Name','PX_last', 'HISTORICAL_MARKET_CAP'])
#2
blp.bdh(
...: tickers='AAPL US Equity', flds=['Security_Name','HISTORICAL_MARKET_CAP','ESG_DISCLOSURE_SCORE'],
...: start_date='2010-1-1', end_date='2020-12-31',
...: )
First code can be executed correctly. The output is like this:
But the second one is wrong, like this:
The columns of "ESG disclosure score" disappear! It seems like it can't catch the data from Bloomberg. I make sure the flds for ESG disclosure score is correct!
Cany somebody help me or suggest me for solving this ?problem ?
Thank you!
The Bloomi Terminal FLDS function should be the first port of call for any questions about the fields that might be available, their format, and any possible overrides. In a Terminal window, type:
AAPL US <Equity Yellow Key> FLDS <Go>
Where it says "Enter Query" type ESG_DISCLOSURE_SCORE and hit 'Go'. If you click on the returned match, you get this screen:
If you scroll down the text, you see:
API:
current value available, historical values available
So this field does, in general, provide history via bdh and bdp. The API (sometimes called DAPI by Bloomberg) is usually what Excel, xbbg, or client-based code is using to get data from Bloomberg. If you can't see the data in this screen, you can't get it via xbbg.
BUT, look at the current value for ESG_DISCLOSURE_SCORE for this security: it is N.A. That means that while the field exists, it has not been populated for AAPL. The text description of the field gives more information.
A feature of xbbg is that if a historical field has no data then it is not included in the returned dataframe of bdh at all (not even as a column of nan). That is why your dataframe does not have a 'ESG_DISCLOSURE_SCORE' column.

How to solve "ECitMatch() got multiple values for argument 'bdata'"?

I am new to use bioservices Python package. Now I am going to use that to retrieve PMIDs for two citations, given the specified information and this is the code I have tried:
from bioservices import EUtils
s = EUtils()
print(s.ECitMatch("pubmed",retmode="xml", bdata="proc+natl+acad+sci+u+s+a|1991|88|3248|mann+bj|Art1|%0Dscience|1987|235|182|palmenberg+ac|Art2|"))
But it occurs an error:
"TypeError: ECitMatch() got multiple values for argument 'bdata'".
Could anyone help me to solve that problem?
I think the issue is that you have an unnamed argument (pubmed); if you look at the source code, you can see that the first argument should be bdata; if you provide the arguments like you do, it is, however, unclear whether bdata is "pubmed" or the named argument bdata, therefore the error you obtain.
You can reproduce it with this minimal example:
def dummy(a, b):
return a, b
dummy(10, a=3)
will return
TypeError: dummy() got multiple values for argument 'a'
If you remove "pubmed", the error disappears, however, the output is still incomplete:
from bioservices import EUtils
s = EUtils()
print(s.ECitMatch("proc+natl+acad+sci+u+s+a|1991|88|3248|mann+bj|Art1|%0Dscience|1987|235|182|palmenberg+ac|Art2|"))
returns
'proc+natl+acad+sci+u+s+a|1991|88|3248|mann+bj|Art1|2014248\n'
so only the first publication is taken into account. You can get the results for both by using the correct carriage return character \r:
print(s.ECitMatch(bdata="proc+natl+acad+sci+u+s+a|1991|88|3248|mann+bj|Art1|\rscience|1987|235|182|palmenberg+ac|Art2|"))
will return
proc+natl+acad+sci+u+s+a|1991|88|3248|mann+bj|Art1|2014248
science|1987|235|182|palmenberg+ac|Art2|3026048
I think you neither have to specify retmod nor the database (pubmed); if you look at the source code I linked above you can see:
query = "ecitmatch.cgi?db=pubmed&retmode=xml"
so seems it always uses pubmed and xml.
Two issues here: syntaxic and a bug.
The correct syntax is:
from bioservices import EUtils
s = EUtils()
query = "proc+natl+acad+sci+u+s+a|1991|88|3248|mann+bj|Art1|%0Dscience|1987|235|182|palmenberg+ac|Art2|"
print(s.ECitMatch(query))
Indeed, the underlying service related to ICitMatch has only one database (pubmed) and one format (xml) hence, those 2 parameters are not available : there are hard-coded. Therefore, only one argument is required: your query.
As for the second issue, as pointed above and reported on the bioservices issues page, your query would return only one publication. This was an issue with the special character %0D (in place of a return carriage) not being interpreted corectly by the URL request. This carriage character (either \n, \r or %0d) is now taken into account in the latest version on github or from pypi website if you use version 1.7.5
Thanks to willigot for filling the issue on bioservices page and bringing it to my attention.
disclaimer: i'm the main author of bioservices

pandas data mining from Eurostat

I'm starting a work to analyse data from Stats Institutions like Eurostat using python, and so pandas. I found out there are two methods to get data from Eurostat.
pandas_datareader: it seems very easy to use but I found some problems to get some specific data
pandasdmx: I've found it a bit complicated but it seems a promising solution, but documentation is poor
I use a free Azure notebook, online service, but I don't think it will complicate more my situation.
Let me explain the problems for pandas_datareader. According to the pandas documentation, in the section API, there is this short documented package and it works. Apart from the shown example, that nicely works, a problem arises about other tables. For example, I can get data about European house price, which ID table is prc_hpi_a with this simple code:
import pandas_datareader.data as web
import datetime
df = web.DataReader('prc_hpi_a', 'eurostat')
But the table has three types of data about dwellings: TOTAL, EXISTING and NEW. I got only Existing dwellings and I don't know how to get the other ones. Do you have a solution for these types of filtering.
Secondly there is the path using pandasdmx. Here it is more complicated. My idea is to upload all data to a pandas DataFrame, and then I can analyse as I want. Easy to say, but I've not find many tutorials that explain this passage: upload data to pandas structures. For example, I found this tutorial, but I'm stuck to the first step, that is instantiate a client:
import pandasdmx
from pandasdmx import client
#estat=client('Eurostat', 'milk.db')
and it returns:
--------------------------------------------------------------------------- ImportError Traceback (most recent call
last) in ()
1 import pandasdmx
----> 2 from pandasdmx import client
3 estat=client('Eurostat', 'milk.db')
ImportError: cannot import name 'client'
What's the problem here? I've looked around but no answer to this problem
I also followed this tutorial:
from pandasdmx import Request
estat = Request('ESTAT')
metadata = estat.datastructure('DSD_une_rt_a').write()
metadata.codelist.iloc[8:18]
resp = estat.data('une_rt_a', key={'GEO': 'EL+ES+IE'}, params={'startPeriod': '2007'})
data = resp.write(s for s in resp.data.series if s.key.AGE == 'TOTAL')
data.columns.names
data.columns.levels
data.loc[:, ('PC_ACT', 'TOTAL', 'T')]
I got the data, but my purpose is to upload them to a pandas structure (Series, DataFrame, etc..), so I can handle easily according to my work. How to do that?
Actually I did with this working line (below the previous ones):
s=pd.DataFrame(data)
But it doesn't work if I try to get other data tables. Let me explain with another example about the Harmonized Index Current Price table:
estat = Request('ESTAT')
metadata = estat.datastructure('DSD_prc_hicp_midx').write()
resp = estat.data('prc_hicp_midx')
data = resp.write(s for s in resp.data.series if s.key.COICOP == 'CP00')
It returns an error here, that is:
--------------------------------------------------------------------------- AttributeError Traceback (most recent call
last) in ()
2 metadata = estat.datastructure('DSD_prc_hicp_midx').write()
3 resp = estat.data('prc_hicp_midx')
----> 4 data = resp.write(s for s in resp.data.series if s.key.COICOP == 'CP00')
5 #metadata.codelist
6 #data.loc[:, ('TOTAL', 'INX_Q','EA', 'Q')]
~/anaconda3_501/lib/python3.6/site-packages/pandasdmx/api.py in
getattr(self, name)
622 Make Message attributes directly readable from Response instance
623 '''
--> 624 return getattr(self.msg, name)
625
626 def _init_writer(self, writer):
AttributeError: 'DataMessage' object has no attribute 'data'
Why does it do not get data now? What's wrong now?
I lost almost a day looking around for some clear examples and explanations. Do you have some to propose? Is there a full and clear documentation? I found also this page with other examples, explaining the use of categorical schemes, but it is not for Eurostat (as explained at some point)
Both methods could work, apart from some explained issues, but I need also a suggestion to have a definitely method to use, to query Eurostat but also many other institutions like OECD, World Bank, etc...
Could you guide me to a definitive and working solution, even if it is different for each institution?
That's my definitive answer to my question that works for each type of data collected from Eurostat. I post here because it can be useful for many.
Let me propose some examples. They produce three pandas series (EU_unempl,EU_GDP,EU_intRates) with data and correct time indexes
#----Unemployment Rate---------
dataEU_unempl=pd.read_json('http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/ei_lmhr_m?geo=EA&indic=LM-UN-T-TOT&s_adj=NSA&unit=PC_ACT',typ='series',orient='table',numpy=True) #,typ='DataFrame',orient='table'
x=[]
for i in range(int(sorted(dataEU_unempl['value'].keys())[0]),1+int(sorted(dataEU_unempl['value'].keys(),reverse=True)[0])):
x=numpy.append(x,dataEU_unempl['value'][str(i)])
EU_unempl=pd.Series(x,index=pd.date_range((pd.to_datetime((sorted(dataEU_unempl['dimension']['time']['category']['index'].keys())[(sorted(int(v) for v in dataEU_unempl['value'].keys())[0])]),format='%YM%M')), periods=len(x), freq='M')) #'1/1993'
#----GDP---------
dataEU_GDP=pd.read_json('http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/namq_10_gdp?geo=EA&na_item=B1GQ&s_adj=NSA&unit=CP_MEUR',typ='series',orient='table',numpy=True) #,typ='DataFrame',orient='table'
x=[]
for i in range((sorted(int(v) for v in dataEU_GDP['value'].keys())[0]),1+(sorted((int(v) for v in dataEU_GDP['value'].keys()),reverse=True))[0]):
x=numpy.append(x,dataEU_GDP['value'][str(i)])
EU_GDP=pd.Series(x,index=pd.date_range((pd.Timestamp(sorted(dataEU_GDP['dimension']['time']['category']['index'].keys())[(sorted(int(v) for v in dataEU_GDP['value'].keys())[0])])), periods=len(x), freq='Q'))
#----Money market interest rates---------
dataEU_intRates=pd.read_json('http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/irt_st_m?geo=EA&intrt=MAT_ON',typ='series',orient='table',numpy=True) #,typ='DataFrame',orient='table'
x=[]
for i in range((sorted(int(v) for v in dataEU_intRates['value'].keys())[0]),1+(sorted((int(v) for v in dataEU_intRates['value'].keys()),reverse=True))[0]):
x=numpy.append(x,dataEU_intRates['value'][str(i)])
EU_intRates=pd.Series(x,index=pd.date_range((pd.to_datetime((sorted(dataEU_intRates['dimension']['time']['category']['index'].keys())[(sorted(int(v) for v in dataEU_intRates['value'].keys())[0])]),format='%YM%M')), periods=len(x), freq='M'))
The general solution is to not rely on overly-specific APIs like datareader and instead go to the source. You can use datareader's source code as inspiration and as a guide for how to do it. But ultimately when you need to get data from a source, you may want to directly access that source and load the data.
One very popular tool for HTTP APIs is requests. You can easily use it to load JSON data from any website or HTTP(S) service. Once you have the JSON, you can load it into Pandas. Because this solution is based on general-purpose building blocks, it is applicable to virtually any data source on the Web (as opposed to e.g. pandaSDMX, which is only applicable to SDMX data sources).
Load with read_csv and multiple separators
The problem with eurostat data from the bulk download repository is that they are tab separated files where the first 3 columns are separated by commas. Pandas read_csv() can deal with mulitple separators as a regex if you specify engine="python". This works for some data sets, but the OP's dataset also contains flags, which cannot be ignored in the last column.
# Load the house price index from the Eurostat bulk download facility
import pandas
code = "prc_hpi_a"
url = f"https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data%2F{code}.tsv.gz" # Pandas.read_csv could almost read it directly with a multiple separator
df = pandas.read_csv(url, sep=",|\t| [^ ]?\t", na_values=":", engine="python")
# But the last column is a character column instead of a numeric because of the
# presence of a flag ": c" illustrated in the last line of the table extract
# below
# purchase,unit,geo\time\t 2006\t 2005
# DW_EXST,I10_A_AVG,AT\t :\t :
# DW_EXST,I10_A_AVG,BE\t 83.86\t 75.16
# DW_EXST,I10_A_AVG,BG\t 87.81\t 76.56
# DW_EXST,I10_A_AVG,CY\t :\t :
# DW_EXST,I10_A_AVG,CZ\t :\t :
# DW_EXST,I10_A_AVG,DE\t100.80\t101.10
# DW_EXST,I10_A_AVG,DK\t113.85\t 91.79
# DW_EXST,I10_A_AVG,EE\t156.23\t 98.69
# DW_EXST,I10_A_AVG,ES\t109.68\t :
# DW_EXST,I10_A_AVG,FI\t : c\t : c
Load with the eurostat package
There is also a python package called eurostat which makes it possible to search and load data set from the bulk facility into pandas data frames.
Load 2 different monthly exchange rate data sets:
import eurostat
df1 = eurostat.get_data_df(code)
The table of content of the bulk download facility can be read with
toc_url = "https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=table_of_contents_en.txt"
toc2 = pandas.read_csv(toc_url, sep="\t")
# Remove white spaces at the beginning and end of strings
toc2 = toc2.applymap(lambda x: x.strip() if isinstance(x, str) else x)
or with
toc = eurostat.get_toc_df()
toc0 = (eurostat.subset_toc_df(toc, "exchange"))
The last line searches for the datasets that have "exchange" in their title
Reshape to long format
It might be useful to reshape the eurostat data to long format
with
if any(df.columns.str.contains("time")):
time_column = df.columns[df.columns.str.contains("time")][-1]
# Id columns are before the time columns
id_columns = df.loc[:, :time_column].columns
df = df.melt(id_vars=id_columns, var_name="period", value_name="value")
# Remove "\time" from the rightmost column of the index
df = df.rename(columns=lambda x: re.sub(r"\\time", "", x))

Trying to edit private dicom tag

I'm currently trying to edit a private dicom tag which is causing problems with a radiotherapy treatment, using pydicom in python. Bit of a python newbie here so bear with me.
The dicom file imports correctly into python; I've attached some of the output in the first image from the commands
ds = dicomio.read_file("xy.dcm")
print(ds)
This returns the following data:
pydicom output
The highlighted tag is the one I need to edit.
When trying something like
ds[0x10,0x10].value
This gives the correct output:
'SABR Spine'
However, trying something along the lines of
ds[3249,1000]
or
ds[3249,1000].value
returns the following output:
> Traceback (most recent call last):
File "<pyshell#64>", line 1, in <module>
ds[3249,1000].value
File "C:\Users\...\dataset.py", line 317, in __getitem__
data_elem = dict.__getitem__(self, tag)
KeyError: (0cb1, 03e8)
If I try accessing [3249,1010] via the same method, it returns a KeyError of (0cb1, 03f2).
I have tried adding the tag to the _dicom_dict.py file, as highlighted in the second image:
end of _dicom_dict.py
Have I done this right? I'm not even sure if I'm accessing the tags correctly - using
ds[300a,0070]
gives me 'SyntaxError: invalid syntax' as the output, for example, even though this is present in the file as fraction group sequence. I have also been made aware that [3249,1000] is connected to [3249,1010] somehow, and apparently since they are proprietary tags, they cannot be edited in Matlab, however it was suggested they could be edited in python for some reason.
Thanks a lot
It looks like your dicomio lookup is converting all inputs to hexadecimal.
You could try:
ds[0x3249,0x1000]
This should prevent any forced conversion to hexadecimal.
You can apparently access them directly as strings:
ds['3249', '1000']
However, your issue is that you are trying to access a data element that is nested several layers deep. Based on your output at the top, I would suggest trying:
first_list_item = ds['300a', '0070'][0]
for item in first_list_item['300c', '0004']:
print(item['3249','1000'])
Essentially, a data element from the top level Dataset object can be either a list or another Dataset object. Makes parsing the data a little harder, but probably unavoidable.
Have a look at this for more info.
As Andrew Guy notes in his last comment, you need to get the first sequence item for 300a,0070. Then get the second sequence item from the 300c,0004 sequence in that item. In that sequence item, you should be able to get the 3249,1000 attribute.

How do I get the recent message in a json string

How do I get the recent message corresponding to reviewer":{"name":"Klocwork Automation User"
INPUT:-
{"project":"platform/frameworks/opt/telephony","branch":"kitkat","id":"Idcf6faee0f6259704ea07b62ce713ebdd4c5da1b","number":"739919","subject":"Correct order of parameter in iccExchangeApdu()","owner":{"name":"Satish Kumar Singh","email":"c_ssing#qca.qualcomm.com","username":"c_ssing"},"url":"https://review-android.quicinc.com/739919","createdOn":1399412660,"lastUpdated":1399418924,"sortKey":"002ce960000b4a4f","open":true,"status":"NEW","comments":[{"timestamp":1399412661,"reviewer":{"name":"Gator Service Account","email":"gator#localhost","username":"gator"},"message":"Patch Set 1: Looks good to me, but someone else must approve\n\nThis patchset has been processed by the Gator."},{"timestamp":1399412704,"reviewer":{"name":"Checkpatch Service Account","email":"checkpatch#localhost","username":"checkpatch"},"message":"Patch Set 1: Looks good to me, but someone else must approve\n\nYour change has passed all of the checks enforced by the android patchchecker."},{"timestamp":1399413456,"reviewer":{"name":"Satish Kumar Singh","email":"c_ssing#qca.qualcomm.com","username":"c_ssing"},"message":"Patch Set 1: Developer Build and Test Successful\n\n"},{"timestamp":1399415354,"reviewer":{"name":"Gueyoung Lee","email":"gueyoung#qca.qualcomm.com","username":"gueyoung"},"message":"Patch Set 1: Looks good to me, but someone else must approve\n\n"},{"timestamp":1399417092,"reviewer":{"name":"Dhananjai Singh","email":"dhananja#qca.qualcomm.com","username":"dhananja"},"message":"Patch Set 1: Looks good to me, but someone else must approve\n\n"},{"timestamp":1399417366,"reviewer":{"name":"David Ng","email":"dng#quicinc.com","username":"dng"},"message":"Patch Set 1: Looks good to me, approved\n\nI remembered the previous change went in recently. How come this was not caught in the original testing as this would have failed right away?\n\nThanks!\nDavid"},{"timestamp":1399418880,"reviewer":{"name":"Klocwork Automation User","email":"kwuser#localhost","username":"kwuser"},"message":"Patch Set 1:\n\nThis change is being verified in klocwork for the following manifests along with other changes as detailed below:\n\n\n\n\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\n\ngit-android.quicinc.com/platform/manifest:kk:default.xml\n\n\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\n\nhttps://commander.qualcomm.com/commander/pages/SimplifiedJobView/LoadComponent_run?jobId\u003d3217513\n\n--------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n\n o https://review-android.quicinc.com/#change,\n\n\n\nPlease note that verification of all changes in this batch need to be successful before this change can be merged.\n\nPLEASE DO NOT UPLOAD A NEW PATCH SET, OR REMOVE APPROVALS UNTIL THE VERIFICATION IS COMPLETE.\n"},{"timestamp":1399418898,"reviewer":{"name":"Linux Build Service Account","email":"lnxbuild#localhost","username":"lnxbuild"},"message":"Patch Set 1:\n\nThis change is being verified in lookahead for the following manifests along with other changes as detailed below:\n\n\n\n\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\n\ngit-android.quicinc.com/platform/manifest:kk:default.xml\n\n\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\n\nhttps://commander.qualcomm.com/commander/pages/SimplifiedJobView/LoadComponent_run?jobId\u003d3217515\n\n--------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n\n o https://review-android.quicinc.com/#change,739919\n\n\n\nPlease note that verification of all changes in this batch need to be successful before this change can be merged.\n\nPLEASE DO NOT UPLOAD A NEW PATCH SET, OR REMOVE APPROVALS UNTIL THE VERIFICATION IS COMPLETE.\n"},{"timestamp":1399418924,"reviewer":{"name":"Linux Build Service Account","email":"lnxbuild#localhost","username":"lnxbuild"},"message":"Patch Set 1:\n\nThis change is being verified in lookahead for the following manifests along with other changes as detailed below:\n\n\n\n\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\n\ngit-android.quicinc.com/platform/manifest:kk:default.xml\n\n\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\n\nhttps://commander.qualcomm.com/commander/pages/SimplifiedJobView/LoadComponent_run?jobId\u003d3217517\n\n--------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n\n o https://review-android.quicinc.com/#change,739919\n\n\n\nPlease note that verification of all changes in this batch need to be successful before this change can be merged.\n\nPLEASE DO NOT UPLOAD A NEW PATCH SET, OR REMOVE APPROVALS UNTIL THE VERIFICATION IS COMPLETE.\n"}]}
{"type":"stats","rowCount":1,"runTimeMilliseconds":3}
I copied the first line of your input into stringdata = u'''...'''
Then load it into json
import json
dict_data =json.loads(stringdata.replace('\n','') )
You need to manually examine the data structure
for c in dict_data ['comments']:
if c['reviewer']['name'].startswith('Klocwork'): # you can use exact search
print c['message']
The output is:
Patch Set 1:This change is being verified in klocwork for the following manifests along with other changes as detailed below:=====================================================================================git-android.quicinc.com/platform/manifest:kk:default.xml=====================================================================================https://commander.qualcomm.com/commander/pages/SimplifiedJobView/LoadComponent_run?jobId=3217513-------------------------------------------------------------------------------------------------------------------------------------------------------------------------- o https://review-android.quicinc.com/#change,Please note that verification of all changes in this batch need to be successful before this change can be merged.PLEASE DO NOT UPLOAD A NEW PATCH SET, OR REMOVE APPROVALS UNTIL THE VERIFICATION IS COMPLETE.

Categories