squadds.core package#

Submodules#

squadds.core.analysis module#

class squadds.core.analysis.Analyzer(db=None)[source]#

Bases: object

The Analyzer class is responsible for analyzing designs and finding the closest designs based on target parameters.

_add_target_params_columns()[source]#

Adds target parameter columns to the dataframe based on the selected system.

_fix_cavity_claw_df()[source]#

Fixes the cavity claw DataFrame by renaming columns and updating values.

_get_H_param_keys()[source]#

Gets the parameter keys for the Hamiltonian based on the selected system.

target_param_keys()[source]#

Returns the target parameter keys.

set_metric_strategy(strategy

MetricStrategy): Sets the metric strategy to use for calculating the distance metric.

_outside_bounds(df

pd.DataFrame, params: dict, display=True) -> bool: Checks if entered parameters are outside the bounds of a dataframe.

find_closest(target_params

dict, num_top: int, metric: str = ‘Euclidean’, display: bool = True): Finds the closest designs in the library based on the target parameters.

get_interpolated_design(target_params

dict, metric: str = ‘Euclidean’, display: bool = True): Gets the interpolated design based on the target parameters.

get_design(df)[source]#

Extracts the design parameters from the dataframe and returns a dict.

Initializes an instance of the Analysis class.

Parameters:

db (-) – The database object.

- db

The database object.

- selected_component_name

The name of the selected component.

- selected_component

The selected component.

- selected_data_type

The selected data type.

- selected_confg

The selected configuration.

- selected_qubit

The selected qubit.

- selected_cavity

The selected cavity.

- selected_coupler

The selected coupler.

- selected_system

The selected system.

- df

The selected dataframe.

- closest_df_entry

The closest dataframe entry.

- closest_design

The closest design.

- presimmed_closest_cpw_design

The presimmed closest CPW design.

- presimmed_closest_qubit_design

The presimmed closest qubit design.

- presimmed_closest_coupler_design

The presimmed closest coupler design.

- interpolated_design

The interpolated design.

- metric_strategy

The metric strategy (will be set dynamically).

- custom_metric_func

The custom metric function.

- metric_weights

The metric weights.

- target_params

The target parameters.

- H_param_keys

The H parameter keys.

closest_design_in_H_space()[source]#

Plots a scatter plot of the closest design in the H-space.

This method creates a scatter plot with two subplots. The first subplot shows the relationship between ‘cavity_frequency_GHz’ and ‘kappa_kHz’, while the second subplot shows the relationship between ‘anharmonicity_MHz’ and ‘g_MHz’. The scatter plot includes pre-simulated data, target data, and the closest design entry from the database.

Returns:

None

compute_metric_distances(row)[source]#
find_closest(target_params, num_top, metric='Euclidean', display=True, parallel=False, num_cpu='auto', skip_df_gen=False)[source]#

Find the closest designs in the library based on the target parameters.

Parameters:
  • target_params (-) – A dictionary containing the target parameters.

  • num_top (-) – The number of closest designs to retrieve.

  • metric (-) – The distance metric to use for calculating distances. Defaults to ‘Euclidean’.

  • display (-) – Whether to display warnings for parameters outside of the library bounds. Defaults to True.

  • parallell (-) – Whether to run metric calculation in a parallelized way

  • num_cpu (-) – The number of CPUs to run a job over

  • skip_df_gen (-) – Whether to generate the df or run from memory

Returns:

A DataFrame containing the closest designs.

Return type:

  • closest_df (DataFrame)

Raises:
  • - ValueError – If the specified metric is not supported or if num_top is bigger than the size of the library.

  • - ValueError – If the metric is invalid.

get_closest_cavity()[source]#

Returns the closest cavity design.

Returns:

The closest cavity design.

Return type:

pd.Series

get_complete_df(target_params, metric='Euclidean', display=True)[source]#

Returns the complete DataFrame (design + Hamiltonian parameters) sourced using the target parameters.

Parameters:
  • target_params (-) – A dictionary containing the target parameters.

  • metric (-) – The distance metric to use for calculating distances. Defaults to ‘Euclidean’.

  • display (-) – Whether to display warnings for parameters outside of the library bounds. Defaults to True.

Returns:

A DataFrame containing all designs and Hamiltonian parameters.

Return type:

  • complete_df (DataFrame)

Raises:
  • - ValueError – If the specified metric is not supported or if num_top is bigger than the size of the library.

  • - ValueError – If the metric is invalid.

get_design(df)[source]#

Extracts the design parameters from the dataframe and returns a dict.

Returns:

A dict containing the design parameters.

Return type:

dict

get_interpolated_design(target_params, metric='Euclidean', display=True)[source]#
get_param(design, param)[source]#

Extracts a specific parameter from the design dict.

reload_db()[source]#

Reload the Analyzer with the current singleton SQuADDS_DB object.

set_metric_strategy(strategy)[source]#

Sets the metric strategy to use for calculating the distance metric.

Parameters:

strategy (MetricStrategy) – The strategy to use for calculating the distance metric.

Raises:

ValueError – If the specified metric is not supported.

target_param_keys()[source]#
Returns:

The target parameter keys.

Return type:

list

squadds.core.analysis.scale_value(value, ratio)[source]#

Scales the given value by the specified ratio.

Parameters:
  • value (-) – The value to be scaled, in the format ‘Xum’ where X is a number.

  • ratio (-) – The scaling ratio.

Returns:

The scaled value in the format ‘Xum’ where X is the scaled number.

Return type:

scaled_value (str)

squadds.core.db module#

!TODO: add FULL support for half-wave cavity

class squadds.core.db.SQuADDS_DB(*args, **kwargs)[source]#

Bases: object

A class representing the SQuADDS database.

supported_components()[source]#

Get a list of supported components.

supported_component_names()[source]#

Get a list of supported component names.

supported_data_types()[source]#

Get a list of supported data types.

_delete_cache()#

Delete the dataset cache directory.

supported_config_names()[source]#

Get a list of supported configuration names.

get_configs()[source]#

Print the supported configuration names.

get_component_names(component)[source]#

Get a list of component names for a given component.

view_component_names(component)[source]#

Print the component names for a given component.

view_datasets()[source]#

Print a table of available datasets.

get_dataset_info(component, component_name, data_type)[source]#

Print information about a specific dataset.

view_all_contributors()[source]#

Print a table of all contributors.

view_contributors_of_config(config)[source]#

Print a table of contributors for a specific configuration.

view_contributors_of(component, component_name, data_type)[source]#

Print a table of contributors for a specific component, component name, and data type.

select_components(component_dict)[source]#

Select a configuration based on a component dictionary or string.

select_system(components)[source]#

Select a system based on a list of components or a single component.

select_qubit(qubit)[source]#

Select a qubit.

select_cavity_claw(cavity)[source]#

Select a cavity.

Constructor for the SQuADDS_DB class.

repo_name#

The name of the repository.

Type:

str

configs#

List of supported configuration names.

Type:

list

selected_component_name#

The name of the selected component.

Type:

str

selected_component#

The selected component.

Type:

str

selected_data_type#

The selected data type.

Type:

str

selected_confg#

The selected configuration.

Type:

str

selected_qubit#

The selected qubit.

Type:

str

selected_cavity#

The selected cavity.

Type:

str

selected_coupler#

The selected coupler.

Type:

str

selected_resonator_type#

The selected resonator type.

Type:

str

selected_system#

The selected system.

Type:

str

selected_df#

The selected dataframe.

Type:

str

target_param_keys#

The target parameter keys.

Type:

str

units#

The units.

Type:

str

_internal_call#

Flag to track internal calls.

Type:

bool

check_login()[source]#

Checks if the user is logged in to Hugging Face.

create_qubit_cavity_df(qubit_df, cavity_df, merger_terms=None, parallelize=False, num_cpu=None)[source]#

Creates a merged DataFrame by merging the qubit and cavity DataFrames based on the specified merger terms.

Parameters:
  • qubit_df (pandas.DataFrame) – The DataFrame containing qubit data.

  • cavity_df (pandas.DataFrame) – The DataFrame containing cavity data.

  • merger_terms (list) – A list of column names to be used for merging the DataFrames. Defaults to None.

  • parallelize (bool) – Whether to use multiprocessing to speed up the merging. Defaults to False.

  • num_cpu (int) – The number of CPU cores to use for multiprocessing. If not specified, the function will use the maximum number of available cores.

Returns:

The merged DataFrame.

Return type:

pandas.DataFrame

Raises:

None

create_system_df(parallelize=False, num_cpu=None)[source]#

Creates and returns a DataFrame based on the selected system.

Parameters:
  • parallelize (bool) – Whether to use multiprocessing to speed up the merging. Defaults to False.

  • num_cpu (int) – The number of CPU cores to use for multiprocessing. If not specified, the function will use the maximum number of available cores.

If the selected system is a single component, it retrieves the dataset based on the selected data type, component, and component name. If a coupler is selected, the DataFrame is filtered by the coupler. The resulting DataFrame is stored in the selected_df attribute.

If the selected system is a list of components (qubit and cavity), it retrieves the qubit and cavity DataFrames. The qubit DataFrame is obtained based on the selected qubit component name and data type “cap_matrix”. The cavity DataFrame is obtained based on the selected cavity component name and data type “eigenmode”. The qubit and cavity DataFrames are merged into a single DataFrame using the merger terms [‘claw_width’, ‘claw_length’, ‘claw_gap’]. The resulting DataFrame is stored in the selected_df attribute.

Raises:

UserWarning – If the selected system is either not specified or does not contain a cavity.

Returns:

The created DataFrame based on the selected system.

Return type:

pandas.DataFrame

find_parquet_files()[source]#

Searches for parquet files in the repository and returns their paths/filenames.

Returns:

A list of paths/filenames of parquet files in the repository.

Return type:

list

generate_qubit_half_wave_cavity_df(parallelize=False, num_cpu=None, save_data=False)[source]#

Generates a DataFrame that combines the qubit and half-wave cavity data.

Parameters:
  • parallelize (bool, optional) – Flag indicating whether to parallelize the computation. Defaults to False.

  • num_cpu (int, optional) – Number of CPUs to use for parallelization. Defaults to None.

  • save_data (bool, optional) – Flag indicating whether to save the generated data. Defaults to False.

Returns:

The generated DataFrame.

Return type:

pandas.DataFrame

Raises:

None

Notes

  • This method generates a DataFrame by combining the qubit and half-wave cavity data.

  • The qubit and cavity data are obtained from the get_dataset and generate_updated_half_wave_cavity_df methods, respectively.

  • The generated DataFrame is optimized to reduce memory usage using various optimization techniques.

  • If save_data is True, the generated DataFrames are saved in the “data” directory.

generate_updated_half_wave_cavity_df(parallelize=False, num_cpu=None)[source]#

!TODO: speed this up!

get_component_names(component=None)[source]#

Get the names of the components associated with a specific component.

Parameters:

component (str) – The specific component to retrieve names for.

Returns:

A list of component names associated with the specified component.

Return type:

list

get_configs()[source]#

Returns the configurations stored in the database.

Returns:

A list of configuration names.

Return type:

list

get_dataset(data_type=None, component=None, component_name=None)[source]#

Retrieves a dataset based on the specified data type, component, and component name.

Parameters:
  • data_type (str) – The type of data to retrieve.

  • component (str) – The component to retrieve the data from.

  • component_name (str) – The name of the component to retrieve the data from.

Returns:

The retrieved dataset.

Return type:

pandas.DataFrame

Raises:
  • ValueError – If the system and component name are not defined.

  • ValueError – If the data type is not specified.

  • ValueError – If the component is not supported.

  • ValueError – If the component name is not supported.

  • ValueError – If the data type is not supported.

  • Exception – If an error occurs while loading the dataset.

get_dataset_info(component=None, component_name=None, data_type=None)[source]#

Retrieves and prints information about a dataset.

Parameters:
  • component (str) – The component of the dataset.

  • component_name (str) – The name of the component.

  • data_type (str) – The type of data.

Returns:

None

get_device_contributors_of(component=None, component_name=None, data_type=None)[source]#

View the reference/source experimental device that was used to validate a specific simulation configuration.

Parameters:
  • component (str) – The component of interest.

  • component_name (str) – The name of the component.

  • data_type (str) – The type of data.

Returns:

The relevant contributor information.

Return type:

dict

get_existing_files()[source]#

Retrieves the list of existing files in the repository.

Returns:

A list of existing file names in the repository.

Return type:

list

get_measured_devices()[source]#

Retrieve all measured devices with their corresponding design codes, paper links, images, foundries, and fabrication recipes.

Returns:

A DataFrame containing the name, design code, paper link, image, foundry, and fabrication recipe for each device.

Return type:

pd.DataFrame

read_parquet_file(file_name)[source]#

Takes in the filename and returns the object to be read as a pandas dataframe.

Parameters:

file_name (str) – The name of the parquet file to read.

Returns:

The dataframe read from the parquet file.

Return type:

pandas.DataFrame

see_dataset(data_type=None, component=None, component_name=None)[source]#

View a dataset based on the provided data type, component, and component name.

Parameters:
  • data_type (str) – The type of data to view.

  • component (str) – The component to use. If not provided, the selected system will be used.

  • component_name (str) – The name of the component. If not provided, the selected component name will be used.

Returns:

The flattened dataset.

Return type:

pandas.DataFrame

Raises:
  • ValueError – If both system and component name are not defined.

  • ValueError – If data type is not specified.

  • ValueError – If the component is not supported.

  • ValueError – If the component name is not supported.

  • ValueError – If the data type is not supported.

  • Exception – If an error occurs while loading the dataset.

select_cavity(cavity=None)[source]#

Selects a cavity and sets the necessary attributes for further operations.

Parameters:

cavity (str) – The name of the cavity to be selected.

Raises:

UserWarning – If the selected system is either not specified or does not contain a cavity.

Returns:

None

select_cavity_claw(cavity=None)[source]#

Selects a cavity claw component.

Parameters:

cavity (str) – The name of the cavity to select.

Raises:

UserWarning – If the selected system is not specified or does not contain a cavity.

Returns:

None

select_components(component_dict=None)[source]#

Selects components based on the provided component dictionary or string.

Parameters:

component_dict (dict or str) – A dictionary containing the component details (component, component_name, data_type) or a string representing the component.

Returns:

None

select_coupler(coupler=None)[source]#

Selects a coupler for the database.

Parameters:

coupler (str, optional) – The name of the coupler to select. Defaults to None.

Returns:

None

select_qubit(qubit=None)[source]#

Selects a qubit and sets the necessary attributes for the selected qubit.

Parameters:

qubit (str) – The name of the qubit to be selected.

Raises:

UserWarning – If the selected system is not specified or does not contain a qubit.

Returns:

None

select_resonator_type(resonator_type)[source]#

Select the coupler based on the resonator type.

Parameters:

resonator_type (str) – The type of resonator, e.g., “quarter” or “half”.

select_system(components=None)[source]#

Selects the system and component(s) to be used.

Parameters:

components (list or str) – The component(s) to be selected. If a list is provided, each component will be checked against the supported components. If a string is provided, it will be checked against the supported components.

Returns:

None

Raises:

None

show_selected_system()[source]#
show_selections()[source]#

Prints the selected system, component, and data type.

If the selected system is a list, it prints the selected qubit, cavity, coupler, and system. If the selected system is a string, it prints the selected component, component name, data type, system, and coupler.

supported_component_names()[source]#

Returns a list of supported component names extracted from the configs.

Returns:

A list of supported component names.

Return type:

list

supported_components()[source]#

Returns a list of supported components based on the configurations.

Returns:

A list of supported components.

Return type:

list

supported_config_names()[source]#

Retrieves the supported configuration names from the repository.

Returns:

A list of supported configuration names.

supported_data_types()[source]#

Returns a list of supported data types.

Returns:

A list of supported data types.

Return type:

list

unselect(param)[source]#

Unselects the specified parameter.

Parameters: param (str): The parameter to unselect. Valid options are:

  • “component”

  • “component_name”

  • “data_type”

  • “qubit”

  • “cavity_claw”

  • “coupler”

  • “system”

Returns: None

unselect_all()[source]#

Clears the selected component, data type, qubit, cavity, coupler, and system.

upload_dataset(file_paths, repo_file_names, overwrite=False)[source]#

Uploads a dataset to the repository.

Parameters:
  • file_paths (list) – A list of file paths to upload.

  • repo_file_names (list) – A list of file names to use in the repository.

  • overwrite (bool) – Whether to overwrite an existing dataset. Defaults to False.

view_all_contributors()[source]#

View all unique contributors and their relevant information from simulation configurations.

This method iterates through the simulation configurations and extracts the relevant information of each contributor. It checks if the combination of uploader, PI, group, and institution is already in the list of unique contributors. If not, it adds the relevant information to the list. Finally, it prints the list of unique contributors in a tabular format with a banner.

view_all_simulation_contributors()[source]#

View all unique simulation contributors and their relevant information.

view_component_names(component=None)[source]#

Prints the names of the components available in the database.

Parameters:

component (str) – The specific component to view names for. If None, all component names will be printed.

Returns:

None

view_contributors_of(component=None, component_name=None, data_type=None, measured_device_name=None)[source]#

View contributors of a specific component, component name, and data type.

Parameters:
  • component (str) – The component of interest.

  • component_name (str) – The name of the component.

  • data_type (str) – The type of data.

  • measured_device_name (str) – The name of the measured device.

Returns:

None

view_contributors_of_config(config)[source]#

View the contributors of a specific configuration.

Parameters:

config (str) – The name of the configuration.

Returns:

None

view_datasets()[source]#

View the datasets available in the database.

This method retrieves the supported components, component names, and data types from the database and displays them in a tabular format.

view_device_contributors_of(component=None, component_name=None, data_type=None)[source]#

View the reference/source experimental device that was used to validate a specific simulation configuration.

Parameters:
  • component (str) – The component of interest.

  • component_name (str) – The name of the component.

  • data_type (str) – The type of data.

Returns:

The name of the experimentally validated reference device, or an error message if not found.

Return type:

str

view_measured_devices()[source]#

View all measured devices with their corresponding design codes, paper links, images, foundries, and fabrication recipes.

This method retrieves and displays the relevant information for each device in the dataset in a well-formatted table.

view_recipe_of(device_name)[source]#

Retrieve the foundry and fabrication recipe information for a specified device.

Parameters:

device_name (str) – The name of the device to retrieve information for.

Returns:

A dictionary containing foundry and fabrication recipe information.

Return type:

dict

view_reference_device_of(component=None, component_name=None, data_type=None)[source]#

View the reference/source experimental device that was used to validate a specific simulation configuration.

Parameters:
  • component (str) – The component of interest.

  • component_name (str) – The name of the component.

  • data_type (str) – The type of data.

view_reference_devices()[source]#

View all unique reference (experimental) devices and their relevant information.

This method iterates through the configurations and extracts the chip’s name within the SQuADDS DB, group, and who the chip was measured by. It also finds the simulation results for the device.It checks if the combination of simulation results uploader, PI, group, and institution is already in the list of unique contributors. If not, it adds the relevant information to the list. Finally, it prints the list of unique devices in a tabular format.

view_simulation_results(device_name)[source]#

View the simulation results of a specific device specified with a device name.

Parameters:

device_name (str) – the name of the experimentally validated device within the database.

Returns:

a dict of sim results.

Return type:

dict

squadds.core.design_patterns module#

class squadds.core.design_patterns.SingletonMeta[source]#

Bases: type

Metaclass for implementing the Singleton design pattern.

squadds.core.globals module#

squadds.core.metrics module#

class squadds.core.metrics.ChebyshevMetric[source]#

Bases: MetricStrategy

Implements the Chebyshev metric strategy.

calculate(target_params, df_row)[source]#

Calculate the Chebyshev distance between target_params and df_row.

Parameters:
  • target_params (dict) – The target parameters as a dictionary.

  • df_row (pd.Series) – A single row from a DataFrame representing a set of parameters.

Returns:

The Chebyshev distance.

Return type:

float

class squadds.core.metrics.CustomMetric(custom_metric_func)[source]#

Bases: MetricStrategy

Implements a custom metric strategy using a user-defined function.

Example Usage:

To use a custom Manhattan distance metric, define the function as follows:

def manhattan_distance(target, simulated):

return sum(abs(target[key] - simulated.get(key, 0)) for key in target)

Then, instantiate CustomMetric with this function:

custom_metric = CustomMetric(manhattan_distance)

Initialize CustomMetric with a custom metric function.

Parameters:

custom_metric_func (callable) – User-defined custom metric function. The function should take two dictionaries as arguments and return a float.

calculate(target_params, df_row)[source]#

Calculate the custom metric between target_params and df_row using the user-defined function.

Parameters:
  • target_params (dict) – The target parameters as a dictionary.

  • df_row (pd.Series) – A single row from a DataFrame representing a set of parameters.

Returns:

The custom metric calculated using the user-defined function.

Return type:

float

class squadds.core.metrics.EuclideanMetric[source]#

Bases: MetricStrategy

Implements the specific Euclidean metric strategy as per your definition.

calculate(target_params, df_row)[source]#

Calculate the custom Euclidean distance between target_params and df_row.

The Euclidean distance is calculated as: sqrt(sum_i (x_i - x_{target})^2 / x_{target}), where x_i are the values in df_row and x_{target} are the target parameters.

Parameters:
  • target_params (dict) – The target parameters as a dictionary.

  • df_row (pd.Series) – A single row from a DataFrame representing a set of parameters.

Returns:

The custom Euclidean distance.

Return type:

float

class squadds.core.metrics.ManhattanMetric[source]#

Bases: MetricStrategy

Implements the Manhattan metric strategy.

calculate(target_params, df_row)[source]#

Calculate the Manhattan distance between target_params and df_row.

Parameters:
  • target_params (dict) – The target parameters as a dictionary.

  • df_row (pd.Series) – A single row from a DataFrame representing a set of parameters.

Returns:

The Manhattan distance.

Return type:

float

class squadds.core.metrics.MetricStrategy[source]#

Bases: ABC

Abstract class for metric strategies.

abstract calculate(target_params, row)[source]#

Calculate the distance metric between target parameters and a DataFrame row.

Parameters:
  • target_params (dict) – Dictionary of target parameters.

  • row (pd.Series) – A row from a DataFrame.

Returns:

Calculated distance.

Return type:

float

calculate_in_parallel(target_params, df, num_jobs=4)[source]#

Calculate distances in parallel.

Parameters:
  • target_params (dict) – Dictionary of target parameters.

  • df (pd.DataFrame) – The DataFrame containing rows to calculate distances for.

  • num_jobs (int) – Number of jobs for parallel processing.

Returns:

Series of calculated distances.

Return type:

pd.Series

class squadds.core.metrics.WeightedEuclideanMetric(weights)[source]#

Bases: MetricStrategy

Concrete class for weighted Euclidean metric.

Initialize the weights.

Parameters:

weights (dict) – Dictionary of weights for each parameter.

calculate(target_params, row)[source]#

Calculate the weighted Euclidean distance between target parameters and a DataFrame row.

Parameters:
  • target_params (dict) – Dictionary of target parameters.

  • row (pd.Series) – A row from a DataFrame.

Returns:

Calculated weighted Euclidean distance.

Return type:

float

squadds.core.processing module#

squadds.core.processing.merge_dfs(qubit_df_split, cavity_df, merger_terms)[source]#
squadds.core.processing.unify_columns(df)[source]#
squadds.core.processing.update_cavity_frequency_and_kappa(merged_df, Z0=50)[source]#

Updates the cavity frequency and kappa based on the given merged_df DataFrame.

Parameters: - merged_df: DataFrame containing the necessary simulation results. - Z0: Characteristic impedance of the system (default: 50 Ohms).

Returns: - cavity_frequency_updated: Updated cavity frequency in Hz. - kappa: Updated kappa in Hz.

squadds.core.processing.update_ncap_parameters(cavity_df, ncap_df, merger_terms, ncap_sim_cols)[source]#

Updates the kappa and frequency of the cavity based on the results of the CapNInterdigitalTee simulations.

squadds.core.utils module#

squadds.core.utils.can_be_categorical(column)[source]#

Check if all elements in the column are hashable.

squadds.core.utils.columns_memory_usage(df)[source]#

Calculates the memory usage of each column and returns a DataFrame showing each column’s memory usage and percentage of total memory usage.

Parameters: - df: DataFrame to process.

Returns: - mem_usage_df: DataFrame with columns ‘Column’, ‘Memory Usage (MB)’, and ‘Percentage of Total Memory Usage’.

squadds.core.utils.compare_schemas(data_schema, expected_schema, path='')[source]#

Compare two schemas and raise an error if there are any mismatches.

Parameters:
  • data_schema (dict) – The data schema to compare.

  • expected_schema (dict) – The expected schema to compare against.

  • path (str, optional) – The current path in the schema. Used for error messages. Defaults to ‘’.

Raises:
  • ValueError – If there is a key in the data schema that is not present in the expected schema.

  • ValueError – If there is a type mismatch between the data schema and the expected schema.

squadds.core.utils.compute_memory_usage(df)[source]#

Compute the memory usage of the given DataFrame.

Parameters:

df (pandas.DataFrame) – The DataFrame to compute the memory usage for.

Returns:

The memory usage of the DataFrame in megabytes.

Return type:

float

squadds.core.utils.convert_list_to_str(lst)[source]#

Converts the given list of floats to a string representation. :param lst: The list of floats to be converted. :type lst: list

Returns:

The string representation of the list.

Return type:

str

squadds.core.utils.convert_numpy(obj)[source]#

Converts NumPy arrays to Python lists recursively.

Parameters:

obj – The object to be converted.

Returns:

The converted object.

squadds.core.utils.convert_to_numeric(value)[source]#

Converts a value to a numeric type if possible.

Parameters:

value – The value to be converted.

Returns:

The converted value if it can be converted to int or float, otherwise returns the original value.

squadds.core.utils.convert_to_str(value, units)[source]#

Converts the given value to a string with the given units. :param value: The value to be converted. :type value: float :param units: The units to be appended to the value. :type units: str

Returns:

The value as a string with the units.

Return type:

str

Create a mailto link with the given recipients, subject, and body.

Parameters:
  • recipients (list) – A list of email addresses of the recipients.

  • subject (str) – The subject of the email.

  • body (str) – The body of the email.

Returns:

The generated mailto link.

Return type:

str

squadds.core.utils.create_unified_design_options(row)[source]#

Create a unified design options dictionary based on the given row.

Parameters:

row (pandas.Series) – The row containing the design options.

Returns:

The unified design options dictionary.

Return type:

dict

squadds.core.utils.delete_HF_cache()[source]#

Deletes the cache directory for the specific dataset.

squadds.core.utils.delete_categorical_columns(df)[source]#

Deletes all columns of type ‘category’ from the DataFrame.

Parameters: - df: DataFrame to process.

Returns: - df: DataFrame with ‘category’ columns removed.

squadds.core.utils.delete_object_columns(df)[source]#

Deletes all columns of type ‘object’ from the DataFrame.

Parameters: - df: DataFrame to process.

Returns: - df: DataFrame with ‘object’ columns removed.

squadds.core.utils.filter_df_by_conditions(df, conditions)[source]#

Filter a DataFrame based on given conditions.

Parameters:
  • df (pandas.DataFrame) – The DataFrame to be filtered.

  • conditions (dict) – A dictionary containing column-value pairs as conditions.

Returns:

The filtered DataFrame.

Return type:

pandas.DataFrame

Raises:

None

squadds.core.utils.flatten_df_second_level(df)[source]#

Flattens a DataFrame by expanding dictionary-like data in the second level of columns.

Parameters:

df (pandas.DataFrame) – The DataFrame to be flattened.

Returns:

A new DataFrame with the flattened data.

Return type:

pandas.DataFrame

squadds.core.utils.get_config_schema(entry)[source]#

Generates the schema for the given entry with specific rules. The ‘sim_results’ are fully expanded, while others are expanded to the first level.

squadds.core.utils.get_entire_schema(obj)[source]#

Recursively traverses the given object and returns a schema representation.

Parameters:

obj – The object to generate the schema for.

Returns:

The schema representation of the object.

squadds.core.utils.get_schema(obj)[source]#

Returns the schema of the given object.

Parameters:

obj – The object for which the schema needs to be determined.

Returns:

The schema of the object. If the object is a dictionary, the schema will be a dictionary with the same keys as the original dictionary, where the values represent the schema of the corresponding values in the original dictionary. If the object is a list, the schema will be either ‘dict’ if the list contains dictionaries, or the type name of the first element in the list. For any other type of object, the schema will be the type name of the object.

squadds.core.utils.get_sim_results_keys(dataframes)[source]#

Get the unique keys from the ‘sim_results’ column of the given dataframes.

Parameters:

dataframes (list or pandas.DataFrame) – A list of dataframes or a single dataframe.

Returns:

A list of unique keys extracted from the ‘sim_results’ column.

Return type:

list

squadds.core.utils.get_type(value)[source]#
squadds.core.utils.is_float(value)[source]#
squadds.core.utils.optimize_dataframe(df)[source]#

Optimize the memory usage of a pandas DataFrame by downcasting data types.

Parameters: - df (pandas.DataFrame): The DataFrame to be optimized.

Returns: - df_optimized (pandas.DataFrame): The optimized DataFrame.

squadds.core.utils.print_column_types(df)[source]#

Prints out the data type of each column in the DataFrame.

Parameters: - df: DataFrame to analyze.

squadds.core.utils.process_design_options(merged_df)[source]#

Processes the ‘design_options’ column in merged_df, appends new columns, converts values, and drops ‘design_options’.

Parameters: - merged_df: DataFrame containing the ‘design_options’ column.

Returns: - merged_df: Modified DataFrame with new columns added and ‘design_options’ dropped.

squadds.core.utils.save_intermediate_df(df, filename, file_idx)[source]#

Save the intermediate DataFrame to disk in Parquet format.

Parameters:
  • df (pd.DataFrame) – The DataFrame to save.

  • filename (str) – The base name of the file to save the DataFrame to.

  • file_idx (int) – The index of the file chunk.

squadds.core.utils.send_email_via_client(dataset_name, institute, pi_name, date, dataset_link)[source]#

Sends an email notification to recipients with the details of the created dataset.

Parameters:
  • dataset_name (str) – The name of the dataset.

  • institute (str) – The name of the institute where the dataset was created.

  • pi_name (str) – The name of the principal investigator who created the dataset.

  • date (str) – The date when the dataset was created.

  • dataset_link (str) – The link to the created dataset.

Returns:

None

squadds.core.utils.set_github_token()[source]#

Sets the GitHub token by appending it to the .env file. If the token already exists in the .env file, it does not add it again. If the GitHub token is not found, it raises a ValueError.

squadds.core.utils.set_huggingface_api_key()[source]#

Sets the Hugging Face API key by appending it to the .env file. If the API key already exists in the .env file, it does not add it again. If the Hugging Face token is not found, it raises a ValueError.

squadds.core.utils.validate_types(data_part, schema_part)[source]#

Recursively validates the types of data_part against the expected types defined in schema_part.

Parameters:
  • data_part (dict) – The data to be validated.

  • schema_part (dict) – The schema defining the expected types.

Raises:

TypeError – If the type of any key in data_part does not match the expected type in schema_part.

Returns:

None

squadds.core.utils.view_contributors_from_rst(rst_file_path)[source]#

Extract and print relevant contributor information from the index.rst file.

Parameters:

rst_file_path (str) – The path to the index.rst file.

Returns:

None

Module contents#