Inserting Data from Dataframe into Ontology using SPARQL and RDFLIB in Python - python

I created an ontology using protegee and now want to insert data using RDFLIB in python. Because I have to write the update sparql statements in string and my data comes in various types including, float64, integer, strings, and Datetime, I had to do some parsing and so not all are working. Below is a snippet of my code:
df=df.tail(2000)
for ind in df.index:
pData=df['Product'][ind]
lData=df['Lifecycle left in minutes'][ind]
XLPerc=df['Percent of Lifecycle left'][ind]
q = """
INSERT DATA
{
myontology:XP myontology:LifecycleData XL.
myontology:XP myontology:UseCycleData XU.
#myontology:XP myontology:LifecyclePer XLPerc.
myontology:XP myontology:Temperature XTemperature.
#myontology:XP myontology:LifecyclePer XLPerc
}
""".replace('XU', str(uData)).replace('XL', str(lData)).replace('XP', str(pData))
g.update(
q,
initNs={
"myontology":Namespace("https://js......../myontology.owl#")
}
)
So I am looping over my Dataframe (df) and inserting it into the ontology. Some are working and some are not working despite using the same method. I am getting ParseException error as follows:
ParseException: Expected end of text, found 'I' (at char 5), (line:2, col:5)
There is a long error code but this is the last line. I can provide more information if needed.
I do not know what the issue is, can somebody help me?
Thank you.

I have been able to rectify the problem myself.
The replace() functions were not evaluating correctly due to too similar variables.
For instance, myontology:XP myontology:LifecyclePer XLPerc and myontology:XP myontology:LifecycleData XL. both had XL as in XLPerc and XL itself.
So, while evaluating, the XL in XLPerc was replaced with another value such as 68.23433Perc and not the expected value68.23433, and many other similar errors like this.
I solved this by defining my variables as unique as possible and now it is evaluating just fine.
Thank you everyone for your help.

Related

Converting Python Dict to JSON for MySQL field of JSON type

I am currently getting this error:
Invalid JSON text: "not a JSON text, may need CAST" at position 0 in value for column
This is the value that is trying to be inserted:
{
"ath": 69045,
"ath_date": "2021-11-10T14:24:11.849Z",
"atl": 67.81,
"atl_date": "2013-07-06T00:00:00.000Z"
}
When trying to insert into my database. I believe it is due to malformed JSON however I am using json.dumps() to convert my dictionary. I have tried several things I have found over the last few hours to try and format it correctly but am hitting a wall between two errors.
I tried adding another level as well as wrapping it all in an array as that was recommended in another question, however, that produced the same error.
My Dict:
ticker_market_data[ticker] = {
"all_time": {
"ath": market_data["ath"]["usd"],
"ath_date": market_data["ath_date"]["usd"],
"atl": market_data["atl"]["usd"],
"atl_date": market_data["atl_date"]["usd"],
},
"price_change_percent": {
"1h": market_data["price_change_percentage_1h_in_currency"]["usd"],
"24h": market_data["price_change_percentage_24h"],
"7d": market_data["price_change_percentage_7d"],
"30d": market_data["price_change_percentage_30d"],
"1y": market_data["price_change_percentage_1y"],
},
}
The problem items being all_time and price_change_percent.
This is how I am creating the variables to store in the database:
all_time = json.dumps(ticker_market_data[ticker].get("all_time"))
price_change_percent = json.dumps(ticker_market_data[ticker].get("price_change_percent"))
I wanted to answer my own question though I don't know how much it will benefit the community as it was completely on me. My code posted above was entirely correct, the problem lay in the order of variables being inserted in the SQL. I had the wrong type of variable one position up from where it needed to be. So rather than JSON being inserted into the column it was a float.

Zipline-trader: Unknown syntax [{'sid': Equity(1576 [JPM])} -

I am relatively new to Python and am working my way through the zipline-trader library. I came across a data structure that I am unfamiliar with and was wondering if you could help me access a certain element out of it.
I ran a backtest on zipline-trader and have the results-DataFrame that has a column "positions" which includes the portfolio positions for a given day.
Here is an example of the content of that column:
[{'sid': Equity(1576 [JPM]), 'amount': 39, 'cost_basis': 25.95397, 'last_sale_price': 25.94}, {'sid': Equity(2942 [UNH]), 'amount': 11, 'cost_basis': 86.62428999999999, 'last_sale_price': 86.58}]
The syntax I am unfamiliar with is the part "Equity (1576 [JPM])" - can anybody explain to me what this is? Also, can you please let me know how to access the "[JPM]"-part of it? Ultimately, what I am trying to do is access that cell of the DataFrame using a loc-function and producing the result "{JPM: 1576, UNH: 2942}"
Thank you!
That is (likely to be) an object of type Equity. If the structure you showed us was stored in a variable data then the object can be fetched using
eq = data[0]['sid']
The text when it's printed will be coming from the __str__ method defined in the Equity class, so it doesn't really tell us anything about how to access it. You would have to look up the documentation.
If you are able to access the object in an interactive session then you could run the help command against it and that might contain something useful. Again, if the structure you showed us was stored in a variable data then you could do:
help(data[0]['sid'])

What is equivalent of Perl DB_FILE module in Python?

I was asked by my supervisor to convert some Perl scripts into Python language. I'm baffled by few lines of code and I am also relatively inexperienced with Python as well. I'm an IT intern, so this was something of a challenge.
Here are the lines of code:
my %sybase;
my $S= tie %sybase, "DB_File", $prismfile, O_RDWR|O_CREAT, 0666, $DB_HASH or die "Cannot open: $!\n";
$DB_HASH->{'cachesize' } = $cache;
I'm not sure what is the equivalent of this statement in Python? DB_FILE is a Perl module. DB_HASH is a database type that allows arbitrary keys/values to be stored in data file, at least that's according to Perl documentation.
After that, the next lines of code also got me stumped on how to convert this to the equivalent in Python as well.
$scnt=0;
while(my $row=$msth->fetchrow_arrayref()) {
$scnt++;
$srow++;
#if ($scnt <= 600000) {
$S->put(join('#',#{$row}[0..5]),join('#',#{$row}[6..19]));
perf(500000,'sybase') ;#if $VERBOSE ;
# }
}
I'll probably use fetchall() in Python to store the entire result dataset in it, then work through it row by row. But I'm not sure how to implement join() correctly in Python, especially since these lines use range within the row index elements -- [0..5]. Also it seems to write the output to data file (look at put()). I'm not sure what perf() does, can anyone help me out here?
I'd appreciate any kind of help here. Thank you very much.

looping through json python is very slow

Can someone help me understand what I'm doing wrong in the following code:
def matchTrigTohost(gtriggerids,gettriggers):
mylist = []
for eachid in gettriggers:
gtriggerids['params']['triggerids'] = str(eachid)
hgetjsonObject = updateitem(gtriggerids,processor)
hgetjsonObject = json.dumps(hgetjsonObject)
hgetjsonObject = json.loads(hgetjsonObject)
hgetjsonObject = eval(hgetjsonObject)
hostid = hgetjsonObject["result"][0]["hostid"]
hname = hgetjsonObject["result"][0]["name"]
endval = hostid + "--" + hname
mylist.append(endval)
return(hgetjsonObject)
The variable gettriggers contain a lot of ids (~3500):
[ "26821", "26822", "26810", ..... ]
I'm looping through the ids in the variable and assigning them to a json object.
gtriggerids = {
"jsonrpc": "2.0",
"method": "host.get",
"params": {
"output": ["hostid", "name"],
"triggerids": "26821"
},
"auth": mytoken,
"id": 2
}
When I run the code against the above json variable, it is very slow. It is taking several minutes to check each ID. I'm sure I'm doing many things wrong here or at least not in the pythonic way. Can anyone help me speed this up? I'm very new to python.
NOTE:
The dump() , load(), eval() were used to convert the str produced to json.
You asked for help knowing what you're doing wrong. Happy to oblige :-)
At the lowest level—why your function is running slowly—you're running many unnecessary operations. Specifically, you're moving data between formats (python dictionaries and JSON strings) and back again which accomplishes nothing but wasting CPU cycles.
You mentioned this is only way you could get the data in the format you needed. That brings me to the second thing you're doing wrong.
You're throwing code at the wall instead of understanding what's happening.
I'm quite sure (and several of your commenters appear to agree) that your code is not the only way to arrange your data into a usable structure. What you should do instead is:
Understand as much as you can about the data you're being given. I suspect the output of updateitem() should be your first target of learning.
Understand the right/typical way to interact with that data. Your data doesn't have to be a dictionary before you can use it. Maybe it's not the best approach.
Understand what regularities and irregularities the data may have. Part of your problem may not be with types or dictionaries, but with an unpredictable/dirty data source.
Armed with all this new knowledge, manipulate your as simply as you can.
I can pretty much guarantee the result will run faster.
More detail! Some things you wrote suggest misconceptions:
I'm looping through the ids in the variable and assigning them to a json object.
No, you can't assign to a JSON object. In python, JSON data is always a string. You probably mean that you're assigning to a python dictionary, which (sometimes!) can be converted to a JSON object, represented as a string. Make sure you have all those concepts clear before you move forward.
The dump() , load(), eval() were used to convert the str produced to json.
Again, you don't call dumps() on a string. You use that to convert a python object to a string. Run this code in a REPL, go step by step, and inspect or play with each output to understand what it is.

Using an IF THEN loop with nested JSON files in Python

I am currently writing a program which uses the ComapaniesHouse API to return a json file containing information about a certain company.
I am able to retrieve the data easily using the following commands:
r = requests.get('https://api.companieshouse.gov.uk/company/COMPANY-NO/filing-history', auth=('API-KEY', ''))
data = r.json()
With that information I can do an awful lot, however I've ran into a problem which I was hoping you guys could possible help me with. What I aim to do is go through every nested entry in the json file and check if the value of certain keys matches certain criteria, if the values of 2 keys match a certain criteria then other code is executed.
One of the keys is the date of an entry, and I would like to ignore results that are older than a certain date, I have attempted to do this with the following:
date_threshold = datetime.date.today() - datetime.timedelta(days=30)``
for each in data["items"]:
date = ['date']
type = ['type']
if date < date_threshold and type is "RM01":
print("wwwwww")
In case it isn't clear, what I'm attempting to do (albeit very badly) is assign each of the entries to a variable, which then gets tested against certain criteria.
Although this doesn't work, python spits out a variable mismatch error:
TypeError: unorderable types: list() < datetime.date()
Which makes me think the date is being stored as a string, and so I can't compare it to the datetime value set earlier, but when I check the API documentation (https://developer.companieshouse.gov.uk/api/docs/company/company_number/filing-history/filingHistoryItem-resource.html), it says clearly that the 'date' entry is returned as a date type.
What am I doing wrong, its very clear that I'm extremely new to python given what I presume is the atrocity of my code, but in my head it seems to make at least a little sense. In case none of this clear, I basically want to go through all the entries in the json file, and the if the date and type match a certain description, then other code can be executed (in this case I have just used random text).
Any help is greatly appreciated! Let me know if you need anything cleared up.
:)
EDIT
After tweaking my code to the below:
for each in data["items"]:
date = each['date']
type = each['type']
if date is '2016-09-15' and type is "RM01":
print("wwwwww")
The code executes without any errors, but the words aren't printed, even though I know there is an entry in the json file with that exact date, and that exact type, any thoughts?
SOLUTION:
Thanks to everyone for helping me out, I had made a couple of very basic errors, the code that works as expected is below::
for each in data["items"]:
date = each['date']
typevariable = each['type']
if date == '2016-09-15' and typevariable == "RM01":
print("wwwwww")
This prints the word "wwwwww" 3 times, which is correct seeing as there are 3 entries in the JSON that fulfil those criteria.
You need to first convert your date variable to a datetime type using datetime.strptime()
You are comparing a list type variable date with datetime type variable date_threshold.

Categories