Escpos thermal printer unique characters character how to generating and printing receipt - python

I tried to use a escpos package but it doesn't generate unique characters for my language, even with encoding ISO-8859-2 and windows-1250 this same behavior appears with python version
I have already printed a pdf, the printer is able to print a unique character so it seems to be a package problem, as I can not solve a problem with escpos I decided to generate a pdf and then print it, but I am not sure what kind of tools to use I saw a pdf kit but some people advice to use XML.
I am looking for some advice with it or maybe you know how to deal with escpos

Related

PyPDF2 can't read non-English characters, returns empty string on extractText()

i'm working on a script that will extract data from a large PDF File (40-60 plus, pages long)
that isn't in English but the file contains Greek characters and all seems good until i run the extractText() function of PyPDF2 to get the givens page contents, then it returns an empty string.
I'm new to this library and i don't know what to do, to fix this problem!!
PyPDF2's "Extract Text" looks like it will either Work Just Fine, or Fail Completely. There's no parameters you can pass in to try to get things to work properly. It'll work or it won't.
You may not be able to fix this problem. If you can successfully copy/paste the text in Acrobat/Reader, then it's possible to extract the text. So what happens when you try to copy/paste out of Reader? Don't try this with some other third party PDF viewer, use Adobe software. You'll probably have to abandon PyPDF2 and move on to some other PDF API, but if Reader can do it, it's a fixable problem.
There are three different things in a PDF that can look like letters to the human eye.
Letters in the PDF in some text encoding. There are several fixed encodings, plus PDF allows you to embed your own custom encodings (often used with font subsets). Software can create PDFs that look fine but can't really be copy/pasted from, even by Adobe.
Path art that just happens to look an awful lot like letters. "Start drawing a line here, draw a straight line to there, then a curve like this to there" and so on. If you're curious, PDF uses Bezier curves to define its curves. Not terribly related to your question, but interesting.
Bit maps (.jpeg/gif/etc images) that define a grid of pixels.
In the past, Reader has only been able to handle text type 1 above, and then only if the text was encoded properly. Broken custom encodings are alarmingly common (or were 7+ years ago when I stopped working on PDF software).
With broken type 1s, and all of 2 and 3, the only thing you can do is to run OCR on the PDF. OCR: Optical Character Recognition. There are several open source OCR projects out there, as well as commercial ones.

Facebook/messenger archive contains emoji that I am unable to parse

I can' figure out how to decode facebook's way of encoding emoji in the messenger archive.
Hi everyone,
I'm trying to code a handy utility to explore messenger's archive file with PYTHON.
The message's file is a "badly encoded "JSON and as stated in this other post: Facebook JSON badly encoded
Using .encode('latin1').decode('utf8) I've been able to deal with most characters such as "é" or "à" and display them correctly. But I'm having a hard time with emojis, as they seem to be encoded in a different way.
Example of a problematic emoji : \u00f3\u00be\u008c\u00ba
The encoding/decoding does not yield any errors, but Tkinter is not willing to display what the function outputs and gives "_tkinter.TclError: character U+fe33a is above the range (U+0000-U+FFFF) allowed by Tcl". Tkinter is not yet this issue thought because trying to display the same emoji in the consol yields "ó¾º" which clearly isn't what's supposed to be displayed ( it's supposed to be a crying face)
I've tried using the emoji library but it doesn't seem to help any
>>> print(emoji.emojize("\u00f3\u00be\u008c\u00ba"))
'ó¾º'
How can I retrieve the proper emoji and display it?
If it's not possible, how can I detect problematic emojis to maybe sanitize and remove them from the JSON in the first place?
Thank you in advance
.encode('latin1').decode('utf8) is correct - it results in the codepoint U+fe33a("󾌺"). This codepoint is in a Private Use Area (PUA) (specifically Supplemental Private Use Area-A), so everyone can assign his own meaning to that codepoint (Maybe facebook wanted to use a crying face, when there wasn't yet one in Unicode, so they used PUA?).
Googling for that char (https://www.google.com/search?q=󾌺) makes google autocorrect it to U+1f62d ("😭") - sadly I have no idea how google maps U+fe33a to U+1f62d.
Googling for U+fe33a site:unicode.org gives https://unicode.org/L2/L2010/10132-emojidata.pdf, which lists U+1F62D as proposed official codepoint.
As that document from unicode lists U+fe33a as a codepoint used by google, I searched for android old emoji codepoints pua. Among other stuff two actually usable results:
How to get Android emoji code point - the question links to :
https://unicodey.com/emoji-data/table.htm - a html table, that seems to be acceptably parsable
and even better: https://github.com/google/mozc/blob/master/src/data/emoji/emoji_data.tsv - a tab sepperated list, that maps modern codepoints to legacy PUA codepoints and other information like this:
1F62D 😭 FE33A E72D E411[...]
https://github.com/googlei18n/noto-emoji/issues/115 - this thread links to:
https://github.com/Crissov/noto-emoji/blob/legacy-pua/emoji_aliases.txt - a machine readable document, that translates legacy PUA codepoints to modern codepoints like this:
FE33A;1F62D # Google
I included my search queries in the answer, because non of the results I found are in any way authoritative - but it should be enough, to get your tool working :-)

Decoding KeyNote IWA protobuf data with Python

Good afternoon,
I am looking for a bit of insight into working with KeyNote files (~2017 ver 8.x).
My objective is fairly basic. I just want to extract the text and images from about 3000 KeyNote files. I am working in Python 2.7 due to the age of many of the tools, but I would like to upgrade to 3.x or 4.x eventually. Despite a lot of reading and experimenting I seen to have hit a wall extracting messages from the IWA objects.
I have been experimenting with various approaches and have also been trying to manually deconstruct the IWA files by hand using the protobuf encoding information. However something just does not add up. Testing with messages created using the Protobuf sample code I can deconstruct 100%, but .IWA blocks from KeyNote files end up with invalid wire types, repeat field numbers or field sizes that don't makes sense (e.g. larger that the size of the IWA block).
What I think I know.
1/ The .key files are a grouping of objects that are zipped and can be unzipped using a generic module like zipfile.
Once unzipped, the key file can be separated giving access to the/index branch and constituant IWA objects.
2/ The IWA files have a 4 byte little endian header, and the rest should follow the google protobuf encoding.
3/ The protobuf encoding does hold for some aspects of the IWA files. e.g recognized blocks of text have the correct tags. However other parts of the IWA does not seem to follow the rules either resulting in invalid wire-type codes (e.g. wire-type=6 ) or, field numbers are zero or are reused.
What I would appreciate is if:
A/ Someone could confirm that the KeyNote encoding does comply with the Google protobuf encoding, or point me at a valid encoding schedule or scheme that I can use.
B/ Someone could clarify if the IAW objects are or are not individually compressed in addition to the compressing applied to the whole .key file. The documentation is unclear, but my attempts to further decompress the IWA objects was not successful.
C/ Someone could direct me to a functional Python library that can extract data from KeyNote files.
As much as I am having fun playing with file deconstruction at the byte and bit level, I still have an objective to achieve :-)
Thank you.
Rusty
Any insights gratefully accepted
I know this is a relatively old question, but I came across it and would offer up some information.
The page
https://github.com/obriensp/iWorkFileFormat/blob/master/Docs/index.md#iwa
seems to have a lot of info on the format. In particular, it seems (from what I gather from that page) that the IWA does not follow exactly the ProtoBuf encoding, which is probably the cause of your problems with invalid wire numbers and non-sensable field lengths.

Can't Read Encoded Text in Visual FoxPro DBF FIles

I recently acquired a ton of data stored in Visual FoxPro 9.0 databases. The text I need is in Cyrillic (Russian), but of the 1000 .dbf files (complete with .fpt and .cdx files), only 4 or 5 return readable text. The rest (usually in the form of memos) returns something like this:
??9Y?u?
yL??x??itZ?????zv?|7?g?̚?繠X6?~u?ꢴe}
?aL1? Ş6U?|wL(Wz???8???7?#R?
.FAc?TY?H???#f U???K???F&?w3A??hEڅԦX?MiOK?,?AZ&GtT??u??r:?q???%,NCGo0??H?5d??]?????O{??
z|??\??pq?ݑ?,??om???K*???lb?5?D?J+z!??
?G>j=???N ?H?jѺAs`c?HK\i
??9a*q??
For the life of me, I can't figure out how this is encoded. I have tried all kinds of online decoders, opened up the .dbfs in many database programs, and used Python to open and manipulate them. All of them returns the similar messiness as above, but never readable Russian.
Note: I know that these databases are not corrupt, because they came accompanied by enterprise software that can open, query and read them successfully. However, that software will not export the data, so I am left working directly with the .dbfs.
Happy to share an example .dbf if would help get to the bottom of this.
I would expect if it is FoxPro database, that the Russian there is encoded in some pre-Unicode encoding for Russian as for most Eastern European languages in ancient times.
For example: Windows-1251 or ISO 8859-5.
'?' characters don't convey much. Try looking at the contents of the memo fields as hex, and see whether what you're seeing looks anything like text in any encodings. (Apologies if you've tried this using Python already). Of course if it is actually encrypted you may be out of luck unless you can find out the key and method.
There are two possibilities:
the encoding has not been correctly stored in the dbf file
the dbf file has been encrypted
If it's been encrypted I can't help you. If it's a matter of finding the correct encoding, my dbf package may be of use. Feel free to send me a sample dbf file if you get stuck.

Python, Windows, Ansi - encoding, again

Hello there,
even if i really tried... im stuck and somewhat desperate when it comes to Python, Windows, Ansi and character encoding. I need help, seriously... searching the web for the last few hours wasn't any help, it just drives me crazy.
I'm new to Python, so i have almost no clue what's going on. I'm about to learn the language, so my first program, which ist almost done, should automatically generate music-playlists from a given folder containing mp3s. That works just fine, besides one single problem...
...i can't write Umlaute (äöü) to the playlist-file.
After i found a solution for "wrong-encoded" Data in the sys.argv i was able to deal with that. When reading Metadata from the MP3s, i'm using some sort of simple character substitution to get rid of all those international special chars, like french accents or this crazy skandinavian "o" with a slash in it (i don't even know how to type it...). All fine.
But i'd like to write at least the mentioned Umlaute to the playlist-file, those characters are really common here in Germany. And unlike the Metadata, where i don't care about some missing characters or miss-spelled words, this is relevant - because now i'm writing the paths to the files.
I've tried so many various encoding and decoding methods, i can't list them all here.. heck, i'm not even able to tell which settings i tried half an hour ago. I found code online, here, and elsewhere, that seemed to work for some purposes. Not for mine.
I think the tricky part is this: it seems like the Problem is the Ansi called format of the files i need to write. Correct - i actually need this Ansi-stuff. About two hours ago i actually managed to write whatever i'd like to an UFT-8 file. Works like charm... until i realized that my Player (Winamp, old Version) somehow doesn't work with those UTF-8 playlist files. It couldn't resolve the Path, even if it looks right in my editor.
If i change the file format back to Ansi, Paths containing special chars get corrupted. I'm just guessing, but if Winamp reads this UTF-8 files as Ansi, that would cause the Problem i'm experiencing right now.
So...
I DO have to write äöü in a path, or it will not work
It DOES have to be an ANSI-"encoded" file, or it will not work
Things like line.write(str.decode('utf-8')) break the funktion of the file
A magical comment at the beginning of the script like # -*- coding: iso-8859-1 -*- does nothing here (though it is helpful when it comes to the mentioned Metadata and allowed characters in it...)
Oh, and i'm using Python 2.7.3. Third-Party modules dependencies, you know...
Is there ANYONE who could guide me towards a way out of this encoding hell? Any help is welcome. If i need 500 lines of Code for another functions or classes, i'll type them. If there's a module for handling such stuff, let me know! I'd buy it! Anything helpful will be tested.
Thank you for reading, thanks for any comment,
greets!
As mentioned in the comments, your question isn't very specific, so I'll try to give you some hints about character encodings, see if you can apply those to your specific case!
Unicode and Encoding
Here's a small primer about encoding. Basically, there are two ways to represent text in Python:
unicode. You can consider that unicode is the ultimate encoding, you should strive to use it everywhere. In Python 2.x source files, unicode strings look like u'some unicode'.
str. This is encoded text - to be able to read it, you need to know the encoding (or guess it). In Python 2.x, those strings look like 'some str'.
This changed in Python 3 (unicode is now str and str is now bytes).
How does that play out?
Usually, it's pretty straightforward to ensure that you code uses unicode for its execution, and uses str for I/O:
Everything you receive is encoded, so you do input_string.decode('encoding') to convert it to unicode.
Everything you need to output is unicode but needs to be encoded, so you do output_string.encode('encoding').
The most common encodings are cp-1252 on Windows (on US or EU systems), and utf-8 on Linux.
Applying this to your case
I DO have to write äöü in a path, or it will not work
Windows natively uses unicode for file paths and names, so you should actually always use unicode for those.
It DOES have to be an ANSI-"encoded" file, or it will not work
When you write to the file, be sure to always run your output through output.encode('cp1252') (or whatever encoding ANSI would be on your system).
Things like line.write(str.decode('utf-8')) break the funktion of the file
By now you probably realized that:
If str as indeed an str instance, Python will try to convert it to unicode using the utf-8 encoding, but then try to encode it again (likely in ascii) to write it to the file
If str is actually an unicode instance, Python will first encode it (likely in ascii, and that will probably crash) to then be able to decode it.
Bottom line is, you need to know if str is unicode, you should encode it. If it's already encoded, don't touch it (or decode it then encode it if the encoding is not the one you want!).
A magical comment at the beginning of the script like # -- coding: iso-8859-1 -- does nothing here (though it is helpful when it comes to the mentioned Metadata and allowed characters in it...)
Not a surprise, this only tells Python what encoding should be used to read your source file so that non-ascii characters are properly recognized.
Oh, and i'm using Python 2.7.3. Third-Party modules dependencies, you know...
Python 3 probably is a big update in terms of unicode and encoding, but that doesn't mean Python 2.x can't make it work!
Will that solve your issue?
You can't be sure, it's possible that the problem lies in the player you're using, not in your code.
Once you output it, you should make sure that your script's output is readable using reference tools (such as Windows Explorer). If it is, but the player still can't open it, you should consider updating to a newer version.
On Windows there is special encoding available called mbcs, it converts between current default ANSI codepage and UNICODE.
For example on a Spanish Language PC:
u'ñ'.encode('mbcs') -> '\xf1'
'\xf1'.decode('mbcs') -> u'ñ'
On Windows ANSI means current default multi-byte code page. For western European languages Windows ISO-8859-1, for eastern European languages windows ISO-8859-2) encoded byte string and other encodings for other languages as appropriate.
More info available at:
https://docs.python.org/2.4/lib/standard-encodings.html
See also:
https://docs.python.org/2/library/sys.html#sys.getfilesystemencoding
# -*- coding comments declare the character encoding of the source code (and therefore of byte-string literals like 'abc').
Assuming that by "playlist" you mean m3u files, then based on this specification you may be at the mercy of the mp3 player software you are using. This spec says only that the files contain text, no mention of what character encoding.
I have personally observed that various mp3 encoding software will use different encodings for mp3 metadata. Some use UTF-8, others ISO-8859-1. So you may have to allow encoding to be specified in configuration and leave it at that.

Categories