Creating usable pandas DataFrame from uploaded Excel file

pszi.terv · 4 December 2023 17:35

Hi All!

I am starting out with Viktor. My question is:
How exactly do you convert an uploaded Excel file to a Pandas DataFrame, which then can be used to visualize the data?

First step is to upload the Excel file with FileField, that part is easy.
Where do I have to define the custom method which contains the pd.read_excel() function?
Inside the ViktorController class? At the beginning of the code, below the import rows, before the Parametrization class?

I would also like to be able to choose the following for the import (because usually we don’t receive the data in the same formatting):

sheet number (default: 0, so the first sheet of the file) - IntegerField?
column (e.g. ‘A’) - TextField?
start row of the data (‘startrow-1’ in the pd.read_excel() function) - IntegerField?

My base code looks like this in Python (not in function format):

df_z = pd.read_excel(filename, 
                     sheet_name=sheet, 
                     names=['z'], 
                     usecols=col_z_data, 
                     skiprows=(data_start_row-1))

My problem is with the ‘filename’ argument, because it does not work if I directly give the “params.upload_file” variable as an argument to the function.

I tried to replicate the solution given in this topic, but it does not seem to work. VSCode raises a problem that BytesIO is not defined. Is it a separate module which have to be imported at the beginning? If so, do I have to put it in the requirements?

rweigand · 4 December 2023 19:16

Hi @pszi.terv,

Indeed, BytesIO is a class which you can import, it’s part of the io package: from io import BytesIO.

In terms of you broader questions, have you done this tutorial: Tutorial - Data Analysis with Plotly and Pandas | VIKTOR Documentation? It might be a good starting point, because it uses Pandas to read a csv file. You could take that code and try to modify it to using an excel sheet.

To try and answer (some of) your questions:

The pd.read_excel() can be defined where you need it:
- The method can be either in a method on the controller that is triggered. Triggering controller methods is done for example when the user selects a view. You can read more about it here: Fundamentals - Call flow | VIKTOR Documentation Whatever you want to do in a controller method is up to you.
- Where in the file is again up to you, although it’s good practice to not have code before your imports. (in fact, if you code does pd.some_function that will not work untill after the line import pandas as pd). It can be any other location in the file, and you can also split up your code over multiple files. In the end: readabillity is the most important thing here, so i’d opt for in the controller method.
- 1 addendum: if you use the excel-to-df code in multiple controller methods, it would make sense to refactor it into a separate method on your controller, something like:

class ModelController(ViktorController):
    label = 'Model'
    parametrization = Parametrization

    @GeometryView('3D Geometry', duration_guess=1)
    def get_3d_view(self, params, **kwargs):
        excel_file = params.file_field.file
        df = self.get_dataframe_from_excel(excel_file)
        geometry = <something>
        return GeometryResult(geometry)

    def get_data_view(self, params, **kwargs):
        excel_file = params.file_field.file
        df = self.get_dataframe_from_excel(excel_file)
        main_data_group = <something>
        return DataResult(main_data_group)

    @staticmethod
    def get_dataframe_from_excel(excel_file):
        df = pd.read_excel(excel_file.name, some other arguments)
        return df

Obviously you could also place that get_dataframe_from_excel somewhere else, but it might be nice to keep it on the controller just so that you have all your code together in 1 location.

pszi.terv · 4 December 2023 20:30

Hi @rweigand!
Thank you for the answer!

I did the tutorial you mentioned, but it is a bit different, because the .csv file is “hardcoded” in the script, it is in the same folder as the app, so it is always the same.
In my case I would like to deal with an Excel file uploaded through Viktor.

I modified my script according to your answer, but now I am encountering the following error:
AttributeError: 'File' object has no attribute 'name'

According to the terminal, the error is in the get_dataframe_from_excel function:

<my path>, line 40, in get_dataframe_from_excel
    df = pd.read_excel(file.name,
                      ^^^^^^^^^

My Parametrization looks like this (Sections to make the input more structured):

class Parametrization(ViktorParametrization):
    
    section_1 = Section("1.) EXCEL IMPORT")
    section_1.xls_file = FileField("Upload Excel file", file_types=['.xls', '.xlsx'])

Is the input correct?

And the Controller:

class Controller(ViktorController):
    label = 'Plotly Diagram'
    parametrization = Parametrization
    
    @staticmethod
    def get_dataframe_from_excel(file, sheet, colname, column, start_row):
        df = pd.read_excel(file.name, 
                           sheet_name=sheet, 
                           names=[colname], 
                           usecols=column, 
                           skiprows=(start_row-1))
        return df

So it is essentially the same as the code you gave (the argument names are a bit different). After the @staticmethod comes my @PlotlyView, which is the same as in the mentioned tutorial.

Daniel · 5 December 2023 08:51

Hi there,

Try storing the file in a BytesIO buffer first:

class Controller(ViktorController):
    label = 'Plotly Diagram'
    parametrization = Parametrization
    
    @staticmethod
    def get_dataframe_from_excel(params, sheet, colname, column, start_row):
        uploaded_file = params.section_1.xls_file.file
        excel_file = BytesIO(uploaded_file.getvalue_binary())

        df = pd.read_excel(excel_file, 
                           sheet_name=sheet, 
                           names=[colname], 
                           usecols=column, 
                           skiprows=(start_row-1))
        return df

Notice that in this case I got the file directly from the params.

Let me know if this helps!

rweigand · 5 December 2023 13:02

Ah sorry, there’s indeed a mistake in my example. To get the filename from a filefield you can use filename instead of name:

from viktor.parametrization import ViktorParametrization, FileField


class Parametrization(ViktorParametrization):
    file = FileField('CPT file')

then on the controller:

class ModelController(ViktorController):
    label = 'Model'
    parametrization = Parametrization

    @GeometryView('3D Geometry', duration_guess=1)
    def get_3d_view(self, params, **kwargs):
        excel_file = params.file_field.file
        file_name = params.file_field.filename
       
    <...>

pszi.terv · 5 December 2023 17:30

Thank you both @Daniel and @rweigand for your help!

I modified my code according to @Daniel’s solution using BytesIO, but with a slight modification. I did not use “params” as an argument, because then the function expects it when it is called:

@staticmethod
def get_dataframe_from_excel(file, sheet, colname, column, start_row):
        excel_file = BytesIO(file.getvalue_binary())
        
        df = pd.read_excel(excel_file, 
                           sheet_name=sheet, 
                           names=[colname], 
                           usecols=column, 
                           skiprows=(start_row-1),
                           engine='openpyxl')
        return df

# Calling the function inside the @PlotlyView:
df_z = self.get_dataframe_from_excel(file=params.section_1.xls_file.file, ...

A little addition:
I had to install the “openpyxl” library also through the requirements.txt file to make the code work. It gave an error if this is missing.

Now the script works beautifully, and looks like this, thank you again

gfsegre · 23 March 2024 15:17

I have a similar question, but considering plotly. Should i be using openpyxl to be able to process/view the data?

import pandas as pd
import numpy as np
import plotly.graph_objects as go
from viktor import (ViktorController, File, ParamsFromFile)
from viktor.parametrization import ViktorParametrization, FileField
from viktor.views import PlotlyView, PlotlyResult

class Parametrization(ViktorParametrization):
file = FileField("upload your gint export file (.xls format): ")

class Controller(ViktorController):
label = ‘My Entity Type’
parametrization = Parametrization(width=60)

@ParamsFromFile()
def process_file(self, file: File, **kwargs):
    data = pd.read_excel(file)
    return data

@PlotlyView("Results View", duration_guess=1)
def plot_data(self, params, **kwargs):
    uploaded_file = params.file  # Get the uploaded file
    
    # Read the Excel file into a DataFrame
    data = pd.read_excel(uploaded_file)
    
    # Group the data by "ID"
    grouped_data = data.groupby('ID')
    
    fig = go.Figure()
    for group_name, group_data in grouped_data:
        fig.add_trace(go.Scatter(x=group_data['Mc (%)'], 
                                 y=group_data['Elev. (ft)'], 
                                 mode='markers', 
                                 name=f'Boring ID: {group_name}'))
    
    # Add titles and labels
    fig.update_layout(title='Moisture v. Elev.',
                      xaxis_title='Moisture (%)',
                      yaxis_title='Elev. (ft.)')
    
    # Return the PlotlyResult
    return PlotlyResult(fig.to_json())

Blockquote