File.getvalue() UnicodeEncodeError parsing .GEF files

Which tool versions are you using?

SDK: v13.7.1
Platform: v2022.x.x
Python: v3.10
Isolation mode: venv/docker

Current Behavior

At:

class IncrementalEncoder(codecs.IncrementalEncoder):
    def encode(self, input, final=False):
        return codecs.charmap_encode(input,self.errors,encoding_table)[0]

Controller code:

    @ParamsFromFile(max_size=int(CPT_FILE_MAX_SIZE_IN_BYTES), file_types=[".GEF"])
    def process_file(self, file: File, **kwargs) -> Dict:
        print("STARTING")
        file_content = file.getvalue()
        print("file.getvalue() fixed")

for File:
2021-0255_2.GEF (103.8 KB)

Expected Behavior

It should parse the file as it always has done.

Did something change in the File.getvalue() method?

This shouldn’t happen

Hi Johan,

Thanks for posting this.

This problem arrises from the fact that the python community is moving away from the ‘standard’ system encoding (which on windows computers is ‘CP-1252’ encoding) as the default to using ‘utf-8’ as the default encoding. You can read about this in PEP686.

But since GEF file are encoded in CP-1252, this gives an error when the file is decoded using ‘utf-8’. Which in your code is implicitly done by not specifying an encoding.

So when handeling files in ‘text’ mode, an encoding should ALWAYS be specified. This holds for the VIKTOR file object as well as general pyhton file handeling with open().

So when you use:

file.getvalue(encoding='cp1252')

the problem will probably go away.

Hope thus solves your problem.

sidenote: if the CUR standard is followed to the letter, GEF should only contain characters form the original 128 character set of ASCII (which is even specified in the GEF itself on line 13 #DATAFORMAT= ASCII). But the GEF contains the ë character, which is not in ASCII. The ASCII characters will not give any problems since they are mapped to the same bytes in ‘almost’ any encoding (to prevent encoding errors :wink: )

It doesn’t work:

@ParamsFromFile(max_size=int(CPT_FILE_MAX_SIZE_IN_BYTES), file_types=[".GEF"])
    def process_file(self, file: File, **kwargs) -> Dict:
        file_content = file.getvalue(encoding='cp-1252')

The following LookupError is returned:

2023-02-02 13:07:02.476 INFO    : Job (uid: 137545) received - EntityType: CPT - call: process_file
2023-02-02 13:07:03.271 ERROR   : Exception is raised
Traceback (most recent call last):
  File "viktor_connector\connector.pyx", line 288, in connector.Job.execute
  File "viktor\core.pyx", line 1881, in viktor.core._handle_job
  File "viktor\core.pyx", line 1850, in viktor.core._handle_job._handle_params_from_file
  File "viktor\core.pyx", line 286, in viktor.core.ParamsFromFile._wrapper
  File "C:\Users\johan.tuls\PythonProjects\CPT Intepretation\app\cpt_module\viktor_entities\cpt\cpt_controller.py", line 87, in process_file
    file_content = file.getvalue(encoding='CPT-1252')
  File "viktor\core.pyx", line 947, in viktor.core.File.getvalue
  File "viktor\core.pyx", line 1850, in viktor.core._handle_job._handle_params_from_file
  File "viktor\core.pyx", line 286, in viktor.core.ParamsFromFile._wrapper
  File "C:\Users\johan.tuls\PythonProjects\CPT Intepretation\app\cpt_module\viktor_entities\cpt\cpt_controller.py", line 87, in process_file
    file_content = file.getvalue(encoding='cp-1252')
  File "viktor\core.pyx", line 947, in viktor.core.File.getvalue
  File "viktor\core.pyx", line 948, in viktor.core.File.getvalue
  File "viktor\core.pyx", line 616, in viktor.core._TextURLFile.read
  File "viktor\core.pyx", line 561, in viktor.core._ResponseStream.read
  File "viktor\core.pyx", line 592, in viktor.core._ResponseStream._load_all
  File "C:\Users\johan.tuls\PythonProjects\CPT Intepretation\venv\lib\site-packages\viktor\_vendor\requests\utils.py", line 540, in stream_decode_response_unicode
    decoder = codecs.getincrementaldecoder(r.encoding)(errors='replace')
  File "C:\Users\johan.tuls\AppData\Local\Programs\Python\Python310\lib\codecs.py", line 1000, in getincrementaldecoder
    decoder = lookup(encoding).incrementaldecoder
LookupError: unknown encoding: cp-1252

Any idea?

It works using “windows-1252”

Hi Johan,

Sorry for that, it works with cp1252 or windows-1252. But i came up with the hibrid cp-1252. :pensive:
Good it is fixed now. I corrected the original message. More about available encodings here.

Maarten

1 Like

Thanks for the link to the encodings. It allows me to learn something new each day :wink: