Encoding/Decoding differences between windows and linux

Hi.
I have made an app where the user uploads a csv. This file is read into a dataframe, some manipulation is carried out and then downloaded as a csv.

The issue i am having is that some of the csv’s contain symbols (diameter symbol etc). The files are Windows (CR LF) ANSI encoded, thus i have specified this encoding when reading the file. I use the same encoding when writing and downloading the file. However what is working in the local environment (windows) does not work in the live environment (linux). I have tried a few things, but every time I get the same result, i.e. incorrect symbols.

This I imagine is a fairly common issue that other people have already solved thus I’m hoping someone can help me out.

Thanks in advance

Hi Natalie,

I believe the following should work:

import viktor as vkt
import pandas as pd


class Parametrization(vkt.Parametrization):
    """Simple parametrization for CSV file upload and download"""
    
    # File upload field for CSV files
    csv_file = vkt.FileField("Upload CSV File", file_types=[".csv"])
    
    # Download button to download the modified file
    download_btn = vkt.DownloadButton("Download CSV with Added Row", method="download_csv")


class Controller(vkt.Controller):
    """Controller to handle CSV file upload and download functionality"""
    
    parametrization = Parametrization

    def download_csv(self, params, **kwargs):
        """
        Download method that reads the uploaded CSV file, adds a new row, 
        and returns the modified file for download.
        
        Args:
            params: The parametrization values containing the uploaded file
            
        Returns:
            DownloadResult: The modified CSV file ready for download
        """
        # Check if a file has been uploaded
        if params.csv_file:
            # Get the uploaded file and read it as CSV
            uploaded_file = params.csv_file.file
            
            # Read the CSV file into a pandas DataFrame with Windows encoding
            with uploaded_file.open(encoding='cp1252') as f:
                df = pd.read_csv(f)
            
            # Add a new row with sample data for the 3 columns
            new_row = pd.DataFrame([["Added_Data_1", "Added_Data_2", "Added_Data_3"]], 
                                 columns=df.columns)
            df = pd.concat([df, new_row], ignore_index=True)
            
            # # Convert the modified DataFrame back to CSV format with Windows encoding
            modified_file = vkt.File()
            with modified_file.open_binary() as f:
                df.to_csv(f, index=False, encoding='cp1252')

            # Generate filename for the modified file
            original_filename = params.csv_file.filename or "uploaded_file.csv"
            modified_filename = original_filename.replace(".csv", "_modified.csv")
            
            # Return the modified file for download
            return vkt.DownloadResult(
                file_content=modified_file,
                file_name=modified_filename
            )

I had a similar pain point trying to upload csv files. The solution above has worked for me so far, but I was confused why the standard utf encoding did not work and it took me some time to debug what was happening. Some documentation or suggested best practices would be welcome.