GoogleAppEngineについて
GoogleAppEngineのローダーで日本語のCSVデータがアップロード出来ない問題について。
下記のようなエラーが発生します。
[home]>appcfg.py upload_data - -config_file=src/state_loader.py --filename=ken_all.csv --kind=State --url=[site_url]/remote_api [GoogleAppEngine] C:\Program Files\Google\google_appengine\appcfg.py:41: DeprecationWarning: the s ha module is deprecated; use the hashlib module instead os.path.join(DIR_PATH, 'lib', 'antlr3'), Application: oesoesoes; version: 1. Uploading data records. [INFO ] Logging to bulkloader-log-20100110.103751 [INFO ] Throttling transfers: [INFO ] Bandwidth: 250000 bytes/second [INFO ] HTTP connections: 8/second [INFO ] Entities inserted/fetched/modified: 20/second [INFO ] Opening database: bulkloader-progress-20100110.103751.sql3 [INFO ] Connecting to oesoesoes.appspot.com/remote_api [INFO ] Starting import; maximum 10 entities per post [ERROR ] [WorkerThread-0] WorkerThread: Traceback (most recent call last): File "C:\Program Files\Google\google_appengine\google\appengine\tools\adaptive _thread_pool.py", line 150, in WorkOnItems status, instruction = item.PerformWork(self.__thread_pool) File "C:\Program Files\Google\google_appengine\google\appengine\tools\bulkload er.py", line 671, in PerformWork transfer_time = self._TransferItem(thread_pool) File "C:\Program Files\Google\google_appengine\google\appengine\tools\bulkload er.py", line 828, in _TransferItem self.request_manager.PostEntities(self.content) File "C:\Program Files\Google\google_appengine\google\appengine\tools\bulkload er.py", line 1252, in PostEntities datastore.Put(entities) File "C:\Program Files\Google\google_appengine\google\appengine\api\datastore. py", line 195, in Put req.entity_list().extend([e._ToPb() for e in entities]) File "C:\Program Files\Google\google_appengine\google\appengine\api\datastore. py", line 576, in _ToPb properties = datastore_types.ToPropertyPb(name, values) File "C:\Program Files\Google\google_appengine\google\appengine\api\datastore_ types.py", line 1577, in ToPropertyPb pbvalue = pack_prop(name, v, pb.mutable_value()) File "C:\Program Files\Google\google_appengine\google\appengine\api\datastore_ types.py", line 1409, in PackString pbvalue.set_stringvalue(value.encode('utf-8')) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128) [INFO ] Backing off due to errors: 1.0 seconds [INFO ] An error occurred. Shutting down... [ERROR ] Error in WorkerThread-0: 'ascii' codec can't decode byte 0xe3 in posi tion 0: ordinal not in range(128) [INFO ] 1 entites total, 0 previously transferred [INFO ] 0 entities (706 bytes) transferred in 1.6 seconds [INFO ] Some entities not successfully transferred end
「bulkloader.py」が問題かとおもいきや、その先で実行されている「datastore_types.py」が問題の様子。
問題調査のため下記の通り修正してみると、動いてしまった。。。
1409行目
pbvalue.set_stringvalue(value.encode('utf-8'))
↓
1409行目
pbvalue.set_stringvalue(value)
ん!?
何故、明示的に変換処理している所で「UnicodeDecodeError」って言われてるの?
あぁ、もちろんファイルはUTF-8にしてるけどね。
調査はしたいけど、現在アップローダーで登録中なのでまた次回に。