Description of the limitation and why it is relevant to address
As a Developer I want that the File class automatically detects the file encoding so that we don’t have to write functions for such a basic and common action
I think this is relevant for the VIKTOR platform because all developers are reading files and often, we just say file.getvalue(encoding=None)
. However, this does not properly detect the encoding of files.
Submitter proposed design (optional)
We now use the function below to detect the encoding. It would be nice if encoding=None
calls this function in the background.
import charset
def _get_file_encoding(self, file) -> str:
"""
This function returns the file encoding.
It raises an error if the encoding is not supported
"""
raw_data = file.getvalue_binary()
encoding_result = chardet.detect(raw_data)
encoding = encoding_result["encoding"]
try:
file.getvalue(encoding=encoding)
return encoding
except UnicodeDecodeError:
raise UnicodeDecodeError("Het gef bestand is met een niet ondersteunde enconding gecodeerd")
Current workarounds
Use the custom made function above