I'm using mathjax-node to try to convert mathjax code into an SVG. Currently, the code I have set up here is this:
const mathjax = require("mathjax-node");
process.stdin.on("data", data => {
mathjax.typeset({
math: data.slice(1),
format: [...data][0] == "Y" ? "inline-TeX" : "TeX",
svg: true
}).then(data => {
process.stdout.write(data.svg + String.fromCodePoint(0));
});
});
Which takes in input and the first character determines if it's inline or not and everything else is the code. It's used by a python file like this:
# -*- coding: utf-8 -*-
from subprocess import *
from pathlib import Path
cdir = "/".join(str(Path(__file__)).split("/")[:-1])
if cdir:
cdir += "/"
converter = Popen(["node", cdir + "mathjax-converter.js"], stdin = PIPE, stdout = PIPE)
def convert_mathjax(mathjax, inline = True):
converter.stdin.write(bytes(("Y" if inline else "N") + mathjax, "utf-8"))
converter.stdin.flush()
result = ""
while True:
char = converter.stdout.read(1)
if not char: return ""
if ord(char) == 0:
return result
result += char.decode("utf-8")
So convert_markdown is the function that takes the code and turns it into the SVG. However, when I try to render the output just using data:text/html,<svg>...</svg>, it gives this error in the console:
Error: <path> attribute d: Expected number, "…3T381 315T301241Q265 210 201 149…".
Using MathJax client-side with the _SVG config option works fine, so how do I resolve this?
I can confirm that there is an error in that SVG path. The T command is supposed to have two coordinate parameters. But there is one in the middle there that doesn't.
T 381 315 T 301241 Q ...
is probably supposed to be:
T 381 315 T 301 241 Q ...
Either there is a bug in the mathjax SVG generator, or something else in your code is accidentally stripping random characters.
Related
I need to parse through a file path in Windows, make sure I have provided a csv file. I have tested the regex in an online regex generator and made sure it matches the text I provide it.
Program.tx:
Program:
'begin'
commands*=Command
'end'
;
Command:
Test | Configuration
;
Test:
'test'
;
Configuration:
'configuration' location=/[a-zA-Z:a-zA-Z\\]+(\.csv$)/
;
test.dsl:
begin
configuration C:\Users\me\Desktop\test.csv
end
program.py:
from textx import metamodel_from_file
from Input import Input
class Robot(object):
def __init__(self):
self.input_location = None
def setInput(self, location):
self.input = Input(location)
def interpret(self, model):
for c in model.commands:
if c.__class__.__name__ == "Configuration":
self.setInput(c.location)
robot_mm = metamodel_from_file('Program.tx')
robot_model = robot_mm.model_from_file('test.dsl')
robot = Robot()
robot.interpret(robot_model)
Once I use Robot.interpret(), I cannot parse through the provided filepath
textx.exceptions.TextXSyntaxError: None:2:19: error: Expected '[a-zA-Z:a-zA-Z\\]+(\.csv$)' at position c:\Users\me\Desktop\test.dsl:(2, 19) => 'on *C:\Users\me\Des'.
After spending a day on the problem, turns out textX doesn't like the anchor character - '$'.
for some reasons, i have to run a php function in python.
However, i realized that it's beyond my limit.
So, i'm asking for help here.
below is the code
function munja_send($mtype, $name, $phone, $msg, $callback, $contents) {
$host = "www.sendgo.co.kr";
$id = ""; // id
$pass = ""; // password
$param = "remote_id=".$id;
$param .= "&remote_pass=".$pass;
$param .= "&remote_name=".$name;
$param .= "&remote_phone=".$phone; //cellphone number
$param .= "&remote_callback=".$callback; // my cellphone number
$param .= "&remote_msg=".$msg; // message
$param .= "&remote_contents=".$contents; // image
if ($mtype == "lms") {
$path = "/Remote/RemoteMms.html";
} else {
$path = "/Remote/RemoteSms.html";
}
$fp = #fsockopen($host,80,$errno,$errstr,30);
$return = "";
if (!$fp) {
echo $errstr."(".$errno.")";
} else {
fputs($fp, "POST ".$path." HTTP/1.1\r\n");
9
fputs($fp, "Host: ".$host."\r\n");
fputs($fp, "Content-type: application/x-www-form-urlencoded\r\n");
fputs($fp, "Content-length: ".strlen($param)."\r\n");
fputs($fp, "Connection: close\r\n\r\n");
fputs($fp, $param."\r\n\r\n");
while(!feof($fp)) $return .= fgets($fp,4096);
}
fclose ($fp);
$_temp_array = explode("\r\n\r\n", $return);
$_temp_array2 = explode("\r\n", $_temp_array[1]);
if (sizeof($_temp_array2) > 1) {
$return_string = $_temp_array2[1];
} else {
$return_string = $_temp_array2[0];
}
return $return_string;
}
i would be glad if anyone can show me a way.
thank you.
I don't know PHP, but based on my understanding, here should be a raw line-for-line translation of the code you provided, from PHP to python. I've preserved your existing comments, and added new ones for clarification in places where I was unsure or where you might want to change.
It should be pretty straightforward to follow - the difference is mostly in syntax (e.g. + for concatenation instead of .), and in converting str to bytes and vice versa.
import socket
def munja_send(mtype, name, phone, msg, callback, contents):
host = "www.sendgo.co.kr"
remote_id = "" # id (changed the variable name, since `id` is also a builtin function)
password = "" # password (`pass` is a reserved keyword in python)
param = "remote_id=" + remote_id
param += "&remote_pass=" + password
param += "&remote_name=" + name
param += "&remote_phone=" + phone # cellphone number
param += "&remote_callback=" + callback # my cellphone number
param += "&remote_msg=" + msg # message
param += "&remote_contents=" + contents # image
if mtype == "lms"
path = "/Remote/RemoteMms.html"
else:
path = "/Remote/RemoteSms.html"
socket.settimeout(30)
# change these parameters as necessary for your desired outcome
fp = socket.socket(family=socket.AF_INET, type=socket.SOCK_STREAM)
errno = fp.connect_ex((host, 80))
if errno != 0:
# I'm not sure where errmsg comes from in php or how to get it in python
# errno should be the same, though, as it refers to the same system call error code
print("Error(" + errno + ")")
else:
returnstr = b""
fp.send("POST " + path + "HTTP/1.1\r\n")
fp.send("Host: " + host + "\r\n")
fp.send("Content-type: application/x-www-form-urlencoded\r\n")
# for accuracy, we convert param to bytes using utf-8 encoding
# before checking its length. Change the encoding as necessary for accuracy
fp.send("Content-length: " + str(len(bytes(param, 'utf-8'))) + "\r\n")
fp.send("Connection: close\r\n\r\n")
fp.send(param + "\r\n\r\n")
while (data := fp.recv(4096)):
# fp.recv() should return an empty string if eof has been hit
returnstr += data
fp.close()
_temp_array = returnstr.split(b'\r\n\r\n')
_temp_array2 = _temp_array[1].split(b'\r\n')
if len(temp_array2) > 1:
return_string = _temp_array2[1]
else:
return_string = _temp_array2[0]
# here I'm converting the raw bytes to a python string, using encoding
# utf-8 by default. Replace with your desired encoding if necessary
# or just remove the `.decode()` call if you're fine with returning a
# bytestring instead of a regular string
return return_string.decode('utf-8')
If possible, you should probably use subprocess to execute your php code directly, as other answers suggest, as straight-up translating code is often error-prone and has slightly different behavior (case in point, the lack of errmsg and probably different error handling in general, and maybe encoding issues in the above snippet). But if that's not possible, then hopefully this will help.
according to the internet, you can use subprocess and then execute the PHP script
import subprocess
# if the script don't need output.
subprocess.call("php /path/to/your/script.php")
# if you want output
proc = subprocess.Popen("php /path/to/your/script.php", shell=True, stdout=subprocess.PIPE)
script_response = proc.stdout.read()
PHP code can be executed in python using libraries subprocess or php.py based on the situation.
Please refer this answer for further details.
I try adapt this pandoc filter but I need use Span instead Div.
input file (myfile.md):
### MY HEADER
[File > Open]{.menu}
[\ctrl + C]{.keys}
Simply line
filter file (myfilter.py):
#!/usr/bin/env python
from pandocfilters import *
def latex(x):
return RawBlock('latex', x)
def latex_menukeys(key, value, format, meta):
if key == 'Span':
[[ident, classes, kvs], contents] = value
if classes[0] == "menu":
return([latex('\\menu{')] + contents + [latex('}')])
elif classes[0] == "keys":
return([latex('\\keys{')] + contents + [latex('}')])
if __name__ == "__main__":
toJSONFilter(latex_menukeys)
run:
pandoc myfile.md -o myfile.tex -F myfilter.py
pandoc:Error in $.blocks[1].c[0]: failed to parse field blocks: failed to parse field c: mempty
CallStack <fromHasCallStack>:
error, called at pandoc.hs:144:42 in main:Main
How I should use varyable "contents" correct?
Suppose Span is inside a paragraph. Then you would be trying to replace it with a RawBlock, which is not going to work. Maybe try using RawInline instead?
I am trying to convert a base64 string back to a GUID style hex number in python and having issues.
Base64 encoded string is: bNVDIrkNbEySjZ90ypCLew==
And I need to get it back to: 2243d56c-0db9-4c6c-928d-9f74ca908b7b
I can do it with the following PHP code but can't work out how to to it in Python
function Base64ToGUID($guid_b64) {
$guid_bin = base64_decode($guid_b64);
return join('-', array(
bin2hex(strrev(substr($guid_bin, 0, 4))),
bin2hex(strrev(substr($guid_bin, 4, 2))),
bin2hex(strrev(substr($guid_bin, 6, 2))),
bin2hex(substr($guid_bin, 8, 2)),
bin2hex(substr($guid_bin, 10, 6))
));
}
Here is the GUIDtoBase64 version:
function GUIDToBase64($guid) {
$guid_b64 = '';
$guid_parts = explode('-', $guid);
foreach ($guid_parts as $k => $part) {
if ($k < 3)
$part = join('', array_reverse(str_split($part, 2)));
$guid_b64 .= pack('H*', $part);
}
return base64_encode($guid_b64);
}
Here are some of the results using some of the obvious and not so obvious options:
import base64
import binascii
>>> base64.b64decode("bNVDIrkNbEySjZ90ypCLew==")
'l\xd5C"\xb9\rlL\x92\x8d\x9ft\xca\x90\x8b{'
>>> binascii.hexlify(base64.b64decode("bNVDIrkNbEySjZ90ypCLew=="))
'6cd54322b90d6c4c928d9f74ca908b7b'
Python port of the existing function (bitstring required)
import bitstring, base64
def base64ToGUID(b64str):
s = bitstring.BitArray(bytes=base64.b64decode(b64str)).hex
def rev2(s_):
def chunks(n):
for i in xrange(0, len(s_), n):
yield s_[i:i+n]
return "".join(list(chunks(2))[::-1])
return "-".join([rev2(s[:8]),rev2(s[8:][:4]),rev2(s[12:][:4]),s[16:][:4],s[20:]])
assert base64ToGUID("bNVDIrkNbEySjZ90ypCLew==") == "2243d56c-0db9-4c6c-928d-9f74ca908b7b"
First off, the b64 string and the resultant GUID doesn't match if we decode properly.
>>> import uuid
>>> import base64
>>> u = uuid.UUID("2243d56c-0db9-4c6c-928d-9f74ca908b7b")
>>> u
UUID('2243d56c-0db9-4c6c-928d-9f74ca908b7b')
>>> u.bytes
'"C\xd5l\r\xb9Ll\x92\x8d\x9ft\xca\x90\x8b{'
>>> base64.b64encode(u.bytes)
'IkPVbA25TGySjZ90ypCLew=='
>>> b = base64.b64decode('bNVDIrkNbEySjZ90ypCLew==')
>>> u2 = uuid.UUID(bytes=b)
>>> print u2
6cd54322-b90d-6c4c-928d-9f74ca908b7b
The base64 encoded version of the resultant GUID you posted is wrong. I'm not sure I understand the way you're encoding the GUID in the first place.
Python has in its arsenal all the tools required for you to be able to answer this problem. However, here's the rough scratching I did in a python terminal:
import uuid
import base64
base64_guid = "bNVDIrkNbEySjZ90ypCLew=="
bin_guid = base64.b64decode(base64_guid)
guid = uuid.UUID(bytes=bin_guid)
print guid
This code should give you enough of a hint to build your own functions. Don't forget, the python shell gives you a powerful tool to test and play with code and ideas. I would investigate using something like IPython notebooks.
I needed to do this to decode a BASE64 UUID that had been dumped from Mongodb. Originally the field had been created by Mongoose. The code I used, based on the code by #tpatja is here:
def base64ToGUID(b64str):
try:
bytes=base64.urlsafe_b64decode(b64str)
except Exception as e:
print("Can't decode base64 ", e)
s = bitstring.BitArray(bytes).hex
return "-".join([s[:8],s[8:][:4],s[12:][:4],s[16:][:4],s[20:]])
Based on good answers above, I wrote a version that does not require the bitstring package and includes validations and support for more input options.
import base64
import regex
import uuid
from typing import Optional
def to_uuid(obj) -> Optional[uuid.UUID]:
if obj is None:
return None
elif isinstance(obj, uuid.UUID):
return obj
elif isinstance(obj, str):
if regex.match(r'[0-9a-fA-F]{8}[-]{0,1}[0-9a-fA-F]{4}[-]{0,1}[0-9a-fA-F]{4}[-]{0,1}[0-9a-fA-F]{4}[-]{0,1}[0-9a-fA-F]{12}', obj):
return uuid.UUID(hex=obj)
elif regex.match(r'[0-9a-zA-Z\+\/]{22}[\=]{2}', obj):
b64_str = base64.b64decode(obj).hex()
uid_str = '-'.join([b64_str[:8], b64_str[8:][:4], b64_str[12:][:4], b64_str[16:][:4], b64_str[20:]])
return uuid.UUID(hex=uid_str)
raise ValueError(f'{obj} is not a valid uuid/guid')
else:
raise ValueError(f'{obj} is not a valid uuid/guid')
Here is a scraper I created using Python on ScraperWiki:
import lxml.html
import re
import scraperwiki
pattern = re.compile(r'\s')
html = scraperwiki.scrape("http://www.shanghairanking.com/ARWU2012.html")
root = lxml.html.fromstring(html)
for tr in root.cssselect("#UniversityRanking tr:not(:first-child)"):
if len(tr.cssselect("td.ranking")) > 0 and len(tr.cssselect("td.rankingname")) > 0:
data = {
'arwu_rank' : str(re.sub(pattern, r'', tr.cssselect("td.ranking")[0].text_content())),
'university' : tr.cssselect("td.rankingname")[0].text_content().strip()
}
# DEBUG BEGIN
if not type(data["arwu_rank"]) is str:
print type(data["arwu_rank"])
print data["arwu_rank"]
print data["university"]
# DEBUG END
if "-" in data["arwu_rank"]:
arwu_rank_bounds = data["arwu_rank"].split("-")
data["arwu_rank"] = int( ( float(arwu_rank_bounds[0]) + float(arwu_rank_bounds[1]) ) * 0.5 )
if not type(data["arwu_rank"]) is int:
data["arwu_rank"] = int(data["arwu_rank"])
scraperwiki.sqlite.save(unique_keys=['university'], data=data)
It works perfectly except when scraping the final data row of the table (the "York University" line), at which point instead of lines 9 through 11 of the code causing the string "401-500" to be retrieved from the table and assigned to data["arwu_rank"], those lines somehow seem instead to be causing the int 450 to be assigned to data["arwu_rank"]. You can see that I've added a few lines of "debugging" code to get a better understanding of what's going on, but also that that debugging code doesn't go very deep.
I have two questions:
What are my options for debugging scrapers run on the ScraperWiki infrastructure, e.g. for troubleshooting issues like this? E.g. is there a way to step through?
Can you tell me why the the int 450, instead of the string "401-500", is being assigned to data["arwu_rank"] for the "York University" line?
EDIT 6 May 2013, 20:07h UTC
The following scraper completes without issue, but I'm still unsure why the first one failed on the "York University" line:
import lxml.html
import re
import scraperwiki
pattern = re.compile(r'\s')
html = scraperwiki.scrape("http://www.shanghairanking.com/ARWU2012.html")
root = lxml.html.fromstring(html)
for tr in root.cssselect("#UniversityRanking tr:not(:first-child)"):
if len(tr.cssselect("td.ranking")) > 0 and len(tr.cssselect("td.rankingname")) > 0:
data = {
'arwu_rank' : str(re.sub(pattern, r'', tr.cssselect("td.ranking")[0].text_content())),
'university' : tr.cssselect("td.rankingname")[0].text_content().strip()
}
# DEBUG BEGIN
if not type(data["arwu_rank"]) is str:
print type(data["arwu_rank"])
print data["arwu_rank"]
print data["university"]
# DEBUG END
if "-" in data["arwu_rank"]:
arwu_rank_bounds = data["arwu_rank"].split("-")
data["arwu_rank"] = int( ( float(arwu_rank_bounds[0]) + float(arwu_rank_bounds[1]) ) * 0.5 )
if not type(data["arwu_rank"]) is int:
data["arwu_rank"] = int(data["arwu_rank"])
scraperwiki.sqlite.save(unique_keys=['university'], data=data)
There's no easy way to debug your scripts on ScraperWiki, unfortunately it just sends your code in its entirety and gets the results back, there's no way to execute the code interactively.
I added a couple more prints to a copy of your code, and it looks like the if check before the bit that assigns data
if len(tr.cssselect("td.ranking")) > 0 and len(tr.cssselect("td.rankingname")) > 0:
doesn't trigger for "York University" so it will be keeping the int value (you set it later on) from the previous time around the loop.