Scrapy - FEED_EXPORT_ENCODING Doesn't Work in Ubuntu Server - python

Even though both my local and server scrapy versions are the same, setting FEED_EXPORT_ENCODING = 'utf-8' in the settings.py doesn't make change in the server for exporting the result in the JSON file.
What I've done in settings.py file :
FEED_EXPORT_ENCODING = 'utf-8'
Which command I run to get the result :
scrapy crawl spiderName -o file.json
What I get in return :
...
'content': u'\n \r\nTruffle
\u0628\u0627 \u0645\u0627\u0654\u0645\u0648'
u'\u0631\u06cc\u062a
\u0631\u0627\u062d\u062a\u200c\u062a\u0631 \u06a9\u0631\u062f\u0646
\u0632'
u'\u0646\u062f\u06af\u06cc
\u062f\u0648\u0644\u0648\u067e\u0631\u0647\u0627\u06cc \u06a9\u0631\
u06cc\u067e\u062a\u0648\u06a9\u
...
I do exactly the same process in my local machine and every unicode decode to utf-8.
What would you suggest?

Related

How to create multilingual web pages with fastapi-babel?

I'm thinking of creating a multilingual web page with fastapi-babel.
I have configured according to the documentation.
The translation from English to French was successful.
However, I created a .po file for another language, translated it, compiled it, but the translated text does not apply.
from fastapi_babel import _
from fastapi_babel.middleware import InternationalizationMiddleware as I18nMiddleware
from fastapi_babel import Babel
from fastapi_babel import BabelConfigs
configs = BabelConfigs(
ROOT_DIR=__file__,
BABEL_DEFAULT_LOCALE="en",
BABEL_TRANSLATION_DIRECTORY="lang",
)
logger.info(f"configs: {configs.__dict__}")
babel = babel(configs)
babel.install_jinja(templates)
app.add_middleware(I18nMiddleware, babel=babel)
#app.get("/items/{id}", response_class=HTMLResponse)
async def read_item(request: Request, id: str):
babel.locale = "en"
logger.info(_("Hello World"))
babel. locale = "fa"
logger.info(_("Hello World"))
babel.locale = "ja"
logger.info(_("Hello World"))
return templates.TemplateResponse('item.html', {'request': request, 'id': id})
Above, the result will be:
INFO: Hello World
INFO: Bonjour le monde
INFO: Hello World
How can the translation be applied to languages other than French?
I was using the old version 0.0.3.
When I changed the version to the latest 0.0.8, the translation was reflected in languages other than French.
pip install fastapi-babel==0.0.8
Note
You need restart FastAPI server, after pybabel compile -d lang
If BABEL_DEFAULT_LOCALE and babel.locale is same, it doesn't translate.
babel = Babel(
configs=BabelConfigs(
ROOT_DIR=__file__,
BABEL_DEFAULT_LOCALE="en",
BABEL_TRANSLATION_DIRECTORY="lang",
)
)
babel.locale = "en"
When you update translation files.
Run this 2 commands.
pybabel extract -F babel.cfg -o messages.pot .
pybabel compile -d lang
Please don't run this command after you create .po file.
pybabel init -i messages.pot -d lang -l fa
If you run, your po file will be reset. (Delete all your translations.)

python3 default encoding UnicodeDecodeError ascii using apache WSGI

import locale
prefered_encoding = locale.getpreferredencoding()
prefered_encoding 'ANSI_X3.4-1968'
I'm using a framework called inginious and it's using web.py to render its template.
web.template.render(os.path.join(root_path, dir_path),
globals=self._template_globals,
base=layout_path)
The rendering works on my localhost but not on my staging server.
They both run python3. I see that web.py enforces utf-8 on
the encoding in Python2 only (that's out of my hands)
def __str__(self):
self._prepare_body()
if PY2:
return self["__body__"].encode('utf-8')
else:
return self["__body__"]
here is the stack trace
t = self._template(name),
File "/lib/python3.5/site-packages/web/template.py", line 1028, in _template,
self._cache[name] = self._load_template(name),
File "/lib/python3.5/site-packages/web/template.py", line 1016, in _load_template
return Template(open(path).read(), filename=path, **self._keywords)
File "/lib64/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 83: ordinal not in range(128),
My html do include hebew chars, small example
<div class="modal-content">
<div class="modal-header">
<button type="button" class="close" data-dismiss="modal">×</button>
<h4 class="modal-title feedback-modal-title">
חישוב האיברים הראשונים בסדרה של איבר ראשון חיובי ויחס שלילי:
<span class="red-text">אי הצלחה</span>
and I open it like so :
open('/path/to/feedback.html').read()
and the line where the encoding fails is where the Hebrew chars are.
I tried setting some environment variables in ~/.bashrc:
export PYTHONIOENCODING=utf8
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8
under the user centos
The ingenious framework is installed as a pip under python3.5 site-packages. and it served by an apache server under the user apache
Tried setting the environment variables in the code (during the init of the app) so that the apache WSGI will be aware of them
import os
os.environ['LC_ALL'] = 'en_US.UTF-8'
os.environ['LANG'] = 'en_US.UTF-8'
os.environ['LANGUAGE'] = 'en_US.UTF-8'
I have edited the /etc/httpd/conf/httpd.conf using the setenv method:
SetEnv LC_ALL en_US.UTF-8
SetEnv LANG en_US.UTF-8
SetEnv LANGUAGE en_US.UTF-8
SetEnv PYTHONIOENCODING utf8
and restarted using sudo service httpd restart and still no luck.
My question is, what is the best practice to solve this. I understand there are hacks for this, but I want to understand what is the underline cause as well as how to solve it.
Thanks!
finally found the answer when reading the file
changed from
open('/path/to/feedback.html').read()
to
import codecs
with codecs.open(file_path,'r',encoding='utf8') as f:
text = f.read()
if anyone has a more general approach that will work, I'll accept his answer
A Python 2+3 solution would be:
import io
with io.open(file_path, mode='r', encoding='utf8') as f:
text = f.read()
See the documentation of io.open.

Django 1.7 dumpdata on Windows scrambles unicode characters

I use manage.py dumpdata --format xml --some-more-parameters to export a full dump of the database to xml. The database is MS sql server and I'm using pyodbc as the driver. The dumpdata command is run using PowerShell and since Django 1.7 does not support a --output argument for the dumpdata command I redirect the output into a file using PowerShell.
Unfortunately the database contains unicode characters (e.g. country \xd6sterreich) and these characters are scrambled int the export file.
Here's what didn't work:
./manage.py dumpdata --format xml > export.xml
./manage.py dumpdata --format xml | out-file -encoding utf8 export.xml
./manage.py dumpdata -format xml | out-file -encoding ANY_OTHER_SUPPORTED_ENCODING export.xml
None of these commands work. Umlauts and accents are scrambled and additionally the > export.xml method adds an invalid BOM to the file which will result in ./manage.py loaddata export.xml aborting with an UnicodeDecode error message when I try to import this on another host.
Any suggestions on how I could export the data and preserve the special characters? The same problem exists when using the json or yaml serializers.
I was able to work around this problem using my own export script. The script below will dump the data and store it in a utf-8 encoded xml file called export_CURRENT-DATE-TIME.xml. call_command() calls the dumpdata command in Django. The script below should be equivalent to using dumpdata with the following arguments:
./manage.py dumpdata --natural --natural-foreign --natural-primary --format xml --indent 2
import sys
import codecs
import os
import django
from django.core.management import call_command
from StringIO import StringIO
from datetime import datetime
# setup access to django
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "PROJECT_NAME.settings")
django.setup()
# the actual export command
def do_work():
#print(u"\xd6sterreich")
call_command('dumpdata', use_natural_keys=True, use_natural_foreign_keys=True, use_natural_primary_keys=True, format='xml', indent=2)
# nasty hack to workaround encoding issues on windows
_stdout = sys.stdout
sys.stdout = StringIO()
do_work()
value = sys.stdout.getvalue().decode('utf-8')
sys.stdout = _stdout
with codecs.open('export_{}.xml'.format(datetime.now().strftime("%Y-%m-%d_%H-%M")), 'w', 'utf-8-sig') as f:
f.write(value)
print("export completed")

Locale on django and uwsgi UnicodeEncodeError

EDIT: I just realized, that when i'm not trying to print to console that variable, it works. Why?
I run into an issue related to displaying string label with utf chars. I set locale env in uwsgi ini file like this:
env =LC_ALL=en_US.UTF-8
env =LANG=en_US.UTF-8
and in wsgi.py:
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
When I run app code:
print (locale.getlocale(), locale.getpreferredencoding())
print locale.getdefaultlocale()
print "option_value", option_value
label = force_text(option_label)
print 'label', label #THIS FAILS
the output is:
(('en_US', 'UTF-8'), 'UTF-8')
('en_US', 'UTF-8')
option_value d
ERROR <stack trace>
print 'label', label
UnicodeEncodeError: 'ascii' codec can't encode character u'\u015b' in position 5: ordinal not in range(128)
The problem is not present when I run app via runserver in production environment.
Django 1.6.5 Python 2.7.6 Ubuntu 14.04 uWSGI 2.0.5.1
I just found answer here: http://chase-seibert.github.io/blog/2014/01/12/python-unicode-console-output.html
Realized that the console is responsible for that error, so exporting additional env variable in uwsgi config file solves the issue: env = PYTHONIOENCODING=UTF-8
for all in django when you want use unicode , like in forms and etc .. you must set a u in leading of your unicode that you want to be saved ! do this any where that your unicode have been saved !
in this case i think it is (option_label)

Internal Server Error 500 - Python, CGI

My .py file executes ok in terminal, but gives this error in the browser
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>500 Internal Server Error</title>
</head><body>
<h1>Internal Server Error</h1>
...
...
Here is the .py file:
#!/usr/bin/python
import cgi
import cgitb; cgitb.enable()
print "Content-Type: text/html\n\n" # HTML is following
print # blank line, end of headers
print "<TITLE>CGI script output</TITLE>"
print "<H1>This is my first CGI script</H1>"
print "Hello, world!"
Should i be saving this as a .cgi file? I have tried with the same errors, i have tried many files like this and none work, i am sure the apache server is working as there are other .cgi scripts running from the same directory without issues.
I have also tried:
#!/usr/local/bin/python &
#!/usr/bin/local/python
Any help appreciated.
EDIT
error log output:
(2) No such file or directory: exec of '.../.../.../test.py' failed
Premature end of script headers: test.py
Here is something I wrote up a while ago. These are some good things to look for when troubleshooting Python CGI.
There are some tips to getting Python working in CGI.
Apache setup: This may be old
Add python as a CGI by modifying the following in the configuration:
Options Indexes FollowSymLinks ExecCGI
AddHandler cgi-script .cgi .py
Always browse the pages through Apache.
Note that viewing files in the filesystem through a browser works for most things on an html page but will not work for CGI. For scripts to work they must be opened through the htdocs file system. The address line of your browser should look like:
\\127.0.0.1\index.html or
\\localhost\index.html
If you open a file up through the file system the CGI will not work. Such as if this is in the location bar of your browser:
c:\Apache\htdocs\index.html (or some other example location)
Convert end of lines of scripts to Unix format:
Most editors have options to "show end of lines" and then a tool to convert from Unix to PC format. You must have the end of lines set to Unix format.
State the path to the Python interpreter on the first line of the CGI script:
You must have one of the following lines as the first line of your Python CGI script:
#!C:\Python25\Python.exe
#!/usr/bin/python
The top line is used when you are debugging on a PC and the bottom is for a server such as 1and1. I leave the lines as shown and then edit them once they are up on the server by deleting the first line.
Print a content type specifying HTML before printing any other output:
This can be done simply by adding the following line somewhere very early in your script:
print "Content-Type: text/html\n\n"
Note that 2 end of lines are required.
Setup Python scripts to give debugging information:
Import the following to get detailed debugging information.
import cgitb; cgitb.enable()
An alternative if cgitb is not available is to do the following:
import sys
sys.stderr = sys.stdout
On the server the python script permissions must be set to execute.
After uploading your files be sure to edit the first line and set the permissions for the file to execute.
Check to see if you can hit the python script directly. If you can't, fix with the above steps (2-6). Then when the Python script is working, debug the shtml.

Categories