Hi.
I have made an app where the user uploads a csv. This file is read into a dataframe, some manipulation is carried out and then downloaded as a csv.
The issue i am having is that some of the csv’s contain symbols (diameter symbol etc). The files are Windows (CR LF) ANSI encoded, thus i have specified this encoding when reading the file. I use the same encoding when writing and downloading the file. However what is working in the local environment (windows) does not work in the live environment (linux). I have tried a few things, but every time I get the same result, i.e. incorrect symbols.
This I imagine is a fairly common issue that other people have already solved thus I’m hoping someone can help me out.
import viktor as vkt
import pandas as pd
class Parametrization(vkt.Parametrization):
"""Simple parametrization for CSV file upload and download"""
# File upload field for CSV files
csv_file = vkt.FileField("Upload CSV File", file_types=[".csv"])
# Download button to download the modified file
download_btn = vkt.DownloadButton("Download CSV with Added Row", method="download_csv")
class Controller(vkt.Controller):
"""Controller to handle CSV file upload and download functionality"""
parametrization = Parametrization
def download_csv(self, params, **kwargs):
"""
Download method that reads the uploaded CSV file, adds a new row,
and returns the modified file for download.
Args:
params: The parametrization values containing the uploaded file
Returns:
DownloadResult: The modified CSV file ready for download
"""
# Check if a file has been uploaded
if params.csv_file:
# Get the uploaded file and read it as CSV
uploaded_file = params.csv_file.file
# Read the CSV file into a pandas DataFrame with Windows encoding
with uploaded_file.open(encoding='cp1252') as f:
df = pd.read_csv(f)
# Add a new row with sample data for the 3 columns
new_row = pd.DataFrame([["Added_Data_1", "Added_Data_2", "Added_Data_3"]],
columns=df.columns)
df = pd.concat([df, new_row], ignore_index=True)
# # Convert the modified DataFrame back to CSV format with Windows encoding
modified_file = vkt.File()
with modified_file.open_binary() as f:
df.to_csv(f, index=False, encoding='cp1252')
# Generate filename for the modified file
original_filename = params.csv_file.filename or "uploaded_file.csv"
modified_filename = original_filename.replace(".csv", "_modified.csv")
# Return the modified file for download
return vkt.DownloadResult(
file_content=modified_file,
file_name=modified_filename
)
I had a similar pain point trying to upload csv files. The solution above has worked for me so far, but I was confused why the standard utf encoding did not work and it took me some time to debug what was happening. Some documentation or suggested best practices would be welcome.