Tutorial 3: Contributing to SQuADDS#

In this tutorial, we will go over the basics of contributing data to the SQuADDS project. We will cover the following topics:

  1. Contribution Information Setup

  2. Understanding the terminology and database structure

  3. Contributing to an existing dataset configuration

  4. Creating new dataset configuration

[1]:
%load_ext autoreload
%autoreload 2

Contribution Information Setup#

In order to contribute to SQuADDS, you will need to provide some information about yourself. This information will be used to track your contributions and to give you credit for your work. You can provide this information by updating the following variables in the .env file in the root directory of the repository:

GROUP_NAME = ""
PI_NAME = ""
INSTITUTION = ""
USER_NAME = ""
CONTRIB_MISC = ""

where GROUP_NAME is the name of your research group, PI_NAME is the name of your PI, INSTITUTION is the name of your institution, USER_NAME is your name, and CONTRIB_MISC is any other information you would like to provide about your contributions (e.g. bibTex citation, paper link, etc).

Alternatively, you can provide this information by executing the following cell.

[2]:
from squadds.database.utils import *
[5]:
create_contributor_info()
Contributor information updated in .env file (c:\Users\PowerAdmin.WIN-NQ8Q8E6B720\.conda\envs\qiskit_metal\Lib\site-packages/.env).

Also ensure the HUGGINGFACE_API_KEY is also set.

[ ]:
from squadds.core.utils import set_huggingface_api_key

set_huggingface_api_key()

Later in the tutorial, we introduce some functionalities that require a GitHub token. If you do not have a GitHub token, you can create one by following the instructions here. Create the PAT (Personal Access Token) with GitHub (with ``repo`` scopes) and save the token as ``GITHUB_TOKEN`` in the ``.env`` file located at the root of the project.

Alternatively, you can execute the following cell to set the GITHUB_TOKEN

[ ]:
from squadds.core.utils import set_github_token

set_github_token()

The last thing you would need to do is to add your public SSH key to your HuggingFace account (https://huggingface.co/settings/keys).

Understanding the terminology and database structure#

HuggingFace#

HuggingFace stands at the forefront of the AI revolution, offering a dynamic collaboration platform for the machine learning community. Renowned for hosting an array of open-source machine learning libraries and tools, Hugging Face Hub serves as a central repository where individuals can share, explore, and innovate with ML technologies. The platform is dedicated to fostering an environment of learning, collaboration, and ethical AI, bringing together a rapidly expanding community of ML engineers, scientists, and enthusiasts.

In our pursuit to enhance the versatility and utility of SQuADDS for quantum hardware developers and machine learning researchers, we have chosen to host our database on the HuggingFace platform. This strategic decision leverages HuggingFace’s capability to support and facilitate research with machine learning models, aligning with methodologies outlined in various references. By making the SQuADDS database readily accessible on this platform, we aim to contribute to the development of cutting-edge Electronic Design Automation (EDA) tools. Our goal is to replicate the transformative impact witnessed in the semiconductor industry, now in the realm of superconducting quantum hardware.

Key to our choice of HuggingFace is its datasets library, which provides a unified interface for accessing a wide range of datasets. This feature is integral to SQuADDS, offering a streamlined and cohesive interface to our database. The decentralized nature of HuggingFace datasets significantly enhances community-driven development and access, a functionality that can be challenging to implement with traditional data storage platforms. This aspect of HuggingFace aligns perfectly with our vision for SQuADDS, enabling us to foster a collaborative and open environment for innovation in quantum technology.

Datasets & Configurations#

As seen in Tutorial 1 we have organized the SQuADDS database into datasets and configurations. Let’s quickly review about these two concepts and how they are used in SQuADDS.

Each configuration in the dataset is uniquely identified by their config. For the SQuADDS Database, the config string is created in the following format:

config = f"{component}_{component_name}_{data_type}"

where component is the name of the component, component_name is the name of the component (in Qiskit Metal), and data_type is the type of simulation data that has been contributed.

This structured approach ensures that users can query specific parts of the dataset relevant to their work, such as a particular type of qubit design or simulation results. This API abstraction allows for more complex queries and operations on the data, facilitating a more efficient workflow for researchers and developers.

Lets check what the config string looks like for our database:

[5]:
from datasets import get_dataset_config_names

configs = get_dataset_config_names("SQuADDS/SQuADDS_DB")
print(configs)
['qubit-TransmonCross-cap_matrix', 'cavity_claw-RouteMeander-eigenmode', 'coupler-NCap-cap_matrix']

You can now access the database using the config string. For example, if you want to access the qubit-TransmonCross-cap_matrix configuration, you can do so by executing the following cell:

[6]:
from datasets import load_dataset

qubit_data = load_dataset("SQuADDS/SQuADDS_DB", configs[0])
print(qubit_data)
DatasetDict({
    train: Dataset({
        features: ['design', 'sim_options', 'sim_results', 'notes', 'contributor'],
        num_rows: 1934
    })
})

Please review Section “Using the SQuADDS API to access and anlyze the database” in Tutorial 1 where we introduce and explain how to use the SQuADDS API to access and analyze the database.

Each contributed entry to SQuADDS must AT LEAST have the following fields. One can add as many more supplementary fields as one wants.

{
    "design":{
        "design_tool": design_tool_name,
        "design_options": design_options,
    },
    "sim_options":{
        "setup": sim_setup_options,
        "simulator": simulator_name,
    },
    "sim_results":{
        "result1": sim_result1,
        "result1_unit": unit1,
        "result2": sim_result2,
        "result2_unit": unit2,
    },
    "contributor":{
        "group": group_name,
        "PI": pi_name,
        "institution": institution,
        "uploader": user_name,
        "misc": contrib_misc,
        "date_created": "YYYY-MM-DD-HHMMSS",
    },
}

If all the sim_results has the same units you can just use a "units":units field instead of repeating the unit for each result.

Note: The "contributor" field is automatically added by the SQuADDS API when you upload your dataset. You do not need to add this field yourself.

Lets look at the schema for the qubit-TransmonCross-cap_matrix configuration that used qiskit-metal as the design tool and Ansys HFSS as the simulation engine.

[7]:
from squadds import SQuADDS_DB

db = SQuADDS_DB()
[8]:
db.get_dataset_info(component="qubit", component_name="TransmonCross", data_type="cap_matrix")
================================================================================
Dataset Features:
{'contributor': {'PI': Value(dtype='string', id=None),
                 'date_created': Value(dtype='string', id=None),
                 'group': Value(dtype='string', id=None),
                 'institution': Value(dtype='string', id=None),
                 'uploader': Value(dtype='string', id=None)},
 'design': {'design_options': {...},
            'design_tool': Value(dtype='string', id=None)},
 'notes': {},
 'sim_options': {'renderer_options': {...},
                 'setup': {...},
                 'simulator': Value(dtype='string', id=None)},
 'sim_results': {'claw_to_claw': Value(dtype='float64', id=None),
                 'claw_to_ground': Value(dtype='float64', id=None),
                 'cross_to_claw': Value(dtype='float64', id=None),
                 'cross_to_cross': Value(dtype='float64', id=None),
                 'cross_to_ground': Value(dtype='float64', id=None),
                 'ground_to_ground': Value(dtype='float64', id=None),
                 'units': Value(dtype='string', id=None)}}

Dataset Description:


Dataset Citation:


Dataset Homepage:


Dataset License:


Dataset Size in Bytes:
9735651
================================================================================

Contributing to an existing configuration#

Single Entry Contribution:#

Let’s revisit Tutorial 2 where we simulated a novel TransmonCross qubit design. We will now learn how to contribute this design to the SQuADDS database.

We have provided a simple API for contributing to the SQuADDS database. The high level steps for contributing to an existing configuration via the SQuADDS API are as follows:

  1. Select the dataset configuration: Select the dataset configuration you would like to contribute to.

  2. Validate your data: Validate your data against the dataset configuration.

  3. Submit your data: Submit your data to the SQuADDS database.

Using the example from Tutorial 2, we will now go through each of these steps.

[6]:
from squadds.database.contributor import ExistingConfigData
[10]:
data = ExistingConfigData('qubit-TransmonCross-cap_matrix')
[11]:
data.show_config_schema()
{
  "design": {
    "design_options": "dict",
    "design_tool": "str"
  },
  "sim_options": {
    "renderer_options": "dict",
    "setup": "dict",
    "simulator": "str"
  },
  "sim_results": {
    "claw_to_claw": "float",
    "claw_to_ground": "float",
    "cross_to_claw": "float",
    "cross_to_cross": "float",
    "cross_to_ground": "float",
    "ground_to_ground": "float",
    "units": "str"
  },
  "notes": {},
  "contributor": "dict"
}
[10]:
design_options = {'pos_x': '-1500um',
  'pos_y': '1200um',
  'orientation': '-90',
  'chip': 'main',
  'layer': '1',
  'connection_pads': {'readout': {'connector_type': '0',
    'claw_length': '190um',
    'ground_spacing': '10um',
    'claw_width': '15um',
    'claw_gap': '5.1um',
    'claw_cpw_length': '40um',
    'claw_cpw_width': '10um',
    'connector_location': '90'}},
  'cross_width': '30um',
  'cross_length': '310um',
  'cross_gap': '30um',
  'hfss_inductance': 9.686e-09,
  'hfss_capacitance': 0,
  'hfss_resistance': 0,
  'hfss_mesh_kw_jj': 7e-06,
  'q3d_inductance': '10nH',
  'q3d_capacitance': 0,
  'q3d_resistance': 0,
  'q3d_mesh_kw_jj': 7e-06,
  'gds_cell_name': 'my_other_junction',
  'aedt_q3d_inductance': 1e-08,
  'aedt_q3d_capacitance': 0,
  'aedt_hfss_inductance': 9.686e-09,
  'aedt_hfss_capacitance': 0}
[11]:
data.add_design({"design_options": design_options, "design_tool":"qiskit_metal"})
[12]:
data.add_sim_result("cross_to_ground", 157.6063, "fF")
data.add_sim_result("claw_to_ground", 101.24431, "fF")
data.add_sim_result("cross_to_claw", 4.517, "fF")
data.add_sim_result("cross_to_cross", 164.52267, "fF")
data.add_sim_result("claw_to_claw", 106.18101, "fF")
data.add_sim_result("ground_to_ground", 320.80404, "fF")

Since we had the data from the simulation stored in the examples/single_xmon_lom.json we can just read the required fields from it for convience, then add the simulation setup & note it down.

[13]:
import json

# read file
results_file = json.load(open("examples/single_xmon_lom.json", "r"))
sim_options = results_file["sim_options"]
setup = sim_options["setup"]
renderer_options = sim_options["renderer_options"]

data.add_sim_setup({
"setup": setup,
"simulator": "ANSYS HFSS",
"renderer_options": renderer_options
})

data.add_notes({"message":"this is a test entry"})

Let’s see what the entry that we have built so far looks like.

[16]:
data.show()
{
    "design": {
        "design_tool": "qiskit_metal",
        "design_options": {
            "pos_x": "-1500um",
            "pos_y": "1200um",
            "orientation": "-90",
            "chip": "main",
            "layer": "1",
            "connection_pads": {
                "readout": {
                    "connector_type": "0",
                    "claw_length": "190um",
                    "ground_spacing": "10um",
                    "claw_width": "15um",
                    "claw_gap": "5.1um",
                    "claw_cpw_length": "40um",
                    "claw_cpw_width": "10um",
                    "connector_location": "90"
                }
            },
            "cross_width": "30um",
            "cross_length": "310um",
            "cross_gap": "30um",
            "hfss_inductance": 9.686e-09,
            "hfss_capacitance": 0,
            "hfss_resistance": 0,
            "hfss_mesh_kw_jj": 7e-06,
            "q3d_inductance": "10nH",
            "q3d_capacitance": 0,
            "q3d_resistance": 0,
            "q3d_mesh_kw_jj": 7e-06,
            "gds_cell_name": "my_other_junction",
            "aedt_q3d_inductance": 1e-08,
            "aedt_q3d_capacitance": 0,
            "aedt_hfss_inductance": 9.686e-09,
            "aedt_hfss_capacitance": 0
        }
    },
    "sim_options": {
        "setup": {
            "name": "sweep_setup",
            "reuse_selected_design": false,
            "reuse_setup": false,
            "freq_ghz": 5.0,
            "save_fields": false,
            "enabled": true,
            "max_passes": 30,
            "min_passes": 2,
            "min_converged_passes": 1,
            "percent_error": 0.1,
            "percent_refinement": 30,
            "auto_increase_solution_order": true,
            "solution_order": "High",
            "solver_type": "Iterative",
            "run": {
                "name": "LOMv2.0",
                "components": [
                    "xmon"
                ],
                "open_terminations": [
                    [
                        "xmon",
                        "readout"
                    ]
                ],
                "box_plus_buffer": true
            }
        },
        "simulator": "ANSYS HFSS",
        "renderer_options": {
            "Lj": "10nH",
            "Cj": 0,
            "_Rj": 0,
            "max_mesh_length_jj": "7um",
            "max_mesh_length_port": "7um",
            "project_path": null,
            "project_name": null,
            "design_name": null,
            "x_buffer_width_mm": 0.2,
            "y_buffer_width_mm": 0.2,
            "wb_threshold": "400um",
            "wb_offset": "0um",
            "wb_size": 5,
            "plot_ansys_fields_options": {
                "name": "NAME:Mag_E1",
                "UserSpecifyName": "0",
                "UserSpecifyFolder": "0",
                "QuantityName": "Mag_E",
                "PlotFolder": "E Field",
                "StreamlinePlot": "False",
                "AdjacentSidePlot": "False",
                "FullModelPlot": "False",
                "IntrinsicVar": "Phase='0deg'",
                "PlotGeomInfo_0": "1",
                "PlotGeomInfo_1": "Surface",
                "PlotGeomInfo_2": "FacesList",
                "PlotGeomInfo_3": "1"
            }
        }
    },
    "sim_results": {
        "cross_to_ground": 157.6063,
        "claw_to_ground": 101.24431,
        "cross_to_claw": 4.517,
        "cross_to_cross": 164.52267,
        "claw_to_claw": 106.18101,
        "ground_to_ground": 320.80404,
        "units": "fF"
    },
    "contributor": {
        "group": "Levenson-Falk Lab",
        "PI": "Eli Levenson-Falk",
        "institution": "USC",
        "uploader": "Sadman Ahmed Shanto",
        "misc": "https://arxiv.org/pdf/2312.13483.pdf",
        "date_created": "2024-01-17 222441"
    },
    "notes": {
        "message": "this is a test entry"
    }
}

It looks about correct by eye but lets ensure it is actually valid by executing the following cell.

[17]:
data.validate()
Structure validated successfully....
Types validated successfully....

Missing keys found. These keys are present in one dictionary but not the other:

Key: contributor.misc is missing in 'ref'
Key: notes.message is missing in 'ref'

It seems that there are no error messages - the missing keys are optional keys that we can add if we want but we can certainly move on without it. We are now ready to submit our data to the SQuADDS database.

Uploading the data#

We can upload our validated entries via HuggingFace. The high level steps are as follows:

  1. Clone/Fork the Repository: If you have not already forked or cloned the repository, please do so.

  2. Create or Checkout a Branch: If adding new data, it might be best to do it on a new branch: git checkout -b branch_name,

  3. Modify the Configuration: Append your validated data entries to your selected dataset configuration.

  4. Commit and Push Your Changes: Commit the new data and push it to your fork: git add .       git commit -m "GOOD COMMIT MESSAGE"       git push origin branch_name

  5. Pull Request: Create a pull request against the original SQuADDS_DB repository.

Of course you can do all this manually, but we have provided a simple API for doing this.

Set the path_to_repo directory to the path to your local copy of the SQuADDS_DB repository or to where you want it to be.

[9]:
path_to_repo = "/Users/shanto/LFL/scratch/hf" #replace with your path to the repo

The following method will automatically clone the latest version of the repository (if there are no conflicts) and create (and checkout) a new branch.

[36]:
data.update_repo(path_to_repo)
Already up to date.

Now, you can append your validated data to your selected dataset configuration by executing the following cell. It also automatically commits and pushes your changes to your fork of the repository.

[19]:
data.update_db(path_to_repo)
Data added to qubit-TransmonCross-cap_matrix.json successfully.

Now you can upload this updated dataset to your fork of the repository and create a pull request against the original SQuADDS_DB repository. Unfortunately, HuggingFace has no API for creating pull requests. You will have to do this step manually by going to your fork of the repository and creating a pull request against the original SQuADDS_DB repository.

We are actively working on setting up an Acceptance Server that will make the process of contributing data to SQuADDS project even easier. Users would able to use the following command (post validation) to upload their data to the SQuADDS database and the rest would be handled by the server:

data.submit()

Batch Mode for Validation and Submission:#

In alternative workflow to the previous section, we can also use the SQuADDS API to validate and submit our data in batch mode (i.e. sweep data contribution). This is useful if you have a large number of entries to contribute to the database. The process of contributing multiple entries to the SQuADDS database is the same to the process of contributing a single entry as explained in the previous section.

In fact, we will start by showing how to contribute the same data from Tutorial 2 to the SQuADDS database using this batch mode process.

We assume that in this workflow, you have already studied the schema of the dataset configuration you want to contribute to and have structured your simulation script to output the data in that format which you store in a json file (examples/single_entry.json).

You still start with the ExistingConfigData object like before.

[20]:
data.clear()
[21]:
data = ExistingConfigData('qubit-TransmonCross-cap_matrix')

Now, you populate this object with your stored data file.

[27]:
json_file_path = "examples/single_xmon_lom.json"
data.from_json(json_file_path)
Contribution loaded successfully.

It is always a good idea to validate the entry (even under the assumption that we have created the file with the correct format)

[28]:
data.validate()
Structure validated successfully....
Types validated successfully....

Missing keys found. These keys are present in one dictionary but not the other:

Key: contributor.misc is missing in 'ref'

Again, an optional key is missing which we can disregard. Now, we can upload this updated dataset to your fork of the repository by executing the following cell.

def contribute(self, path_to_repo, is_sweep=False):
        """
        Contributes to the repository by updating the local repo, updating the database, and uploading to HF.

        Args:
            path_to_repo (str): The path to the repository.
            is_sweep (bool): True if the contribution is a sweep, False otherwise.

        Returns:
            None
        """
        if not self.is_validated:
            raise ValueError("Data must be validated before contributing.")
        self.update_repo(path_to_repo)
        self.update_db(path_to_repo, is_sweep)
        print("Contribution ready for PR")
[29]:
data.contribute(path_to_repo)
Already up to date.
Data added to qubit-TransmonCross-cap_matrix.json successfully.
Contribution ready for PR

The same validation and contribution methods can be used for other configurations, we’ll take a look at two more examples: the RouteMeander eigenmode and the CapInterdigital capacitance matrix. When working with more than one configuration from the SQuADDS database, it may be helpful in uniquely naming each ‘data’ variable.

Let’s start with the RouteMeander eigenmode configuration.

[7]:
data_eigenmode = ExistingConfigData('cavity_claw-RouteMeander-eigenmode')

With this configuration, we will now proceed as we did previously by reading in our data file.

[14]:
json_file_path = "examples/single_RouteMeander_eigenmode.json"
data_eigenmode.from_json(json_file_path)
Contribution loaded successfully.

It is important to validate every set of data that you wish to contribute, different configurations might have different acceptable formats.

[10]:
data_eigenmode.validate()
Structure validated successfully....
Types validated successfully....

Mismatched keys found. These keys are present in both dictionaries but have values of different types:

Key: contributor.misc, data type in 'data': <class 'str'>, data type in 'ref': <class 'NoneType'>

Missing keys found. These keys are present in one dictionary but not the other:

Key: design.design_options.cplr_opts.finger_length is missing in 'data'
Key: design.design_options.cplr_opts.cap_gap_ground is missing in 'data'
Key: design.design_options.cplr_opts.cap_width is missing in 'data'
Key: design.design_options.cplr_opts.cap_distance is missing in 'data'
Key: design.design_options.cplr_opts.cap_gap is missing in 'data'
Key: design.design_options.cplr_opts.finger_count is missing in 'data'
Key: design.design_options.cpw_opts.lead.end_straight is missing in 'data'
Key: design.design_options.cpw_opts.lead.start_jogged_extension is missing in 'data'

These missing keys can be ignored in the validation as in the case of this configuration there were more than one acceptable coupler type.

After validation is complete, you can follow the same code as above to contribute your data to the repository to prepare for the final PR.

Finally, let’s see how to configure and validate data for a CapNInterdigital capacitance matrix.

[11]:
data_cap = ExistingConfigData('coupler-CapNInterdigitalTee-cap_matrix')
[12]:
json_file_path = "examples/single_CapNinterdigital_cap_matrix.json"
data_cap.from_json(json_file_path)
Contribution loaded successfully.
[13]:
data_cap.validate()
Structure validated successfully....
Types validated successfully....

Missing keys found. These keys are present in one dictionary but not the other:

Key: contributor.misc is missing in 'ref'

Again, the contribution for both of these configurations would be the same as our first example and would look as follows.

[ ]:
data_eigenmode.contribute(path_to_repo)

or

[ ]:
data_cap.contribute(path_to_repo)

Multiple Entry (Sweep) Contribution:#

The same process can be used to contribute multiple entries to the SQuADDS database. The only difference is that you will need to provide a list of data files instead of a single data file.

Again, we are making the assumption that you have already studied the schema of the dataset configuration you want to contribute to and have structured your simulation script to output the data in that format which you store in json files.

[30]:
data.clear()
[31]:
data = ExistingConfigData('qubit-TransmonCross-cap_matrix')
[33]:
json_files_path = "examples/sweep_data/"
data.from_json(json_files_path,is_sweep=True)
Sweep data loaded successfully.
[34]:
data.validate_sweep()
Validating entry 1 of 4...
Structure validated successfully....
Types validated successfully....
Entry 1 of 4 validated successfully.
--------------------------------------------------
Validating entry 2 of 4...
Structure validated successfully....
Types validated successfully....
Entry 2 of 4 validated successfully.
--------------------------------------------------
Validating entry 3 of 4...
Structure validated successfully....
Types validated successfully....
Entry 3 of 4 validated successfully.
--------------------------------------------------
Validating entry 4 of 4...
Structure validated successfully....
Types validated successfully....
Entry 4 of 4 validated successfully.
--------------------------------------------------
[35]:
data.contribute(path_to_repo, is_sweep=True)
Already up to date.
Data added to qubit-TransmonCross-cap_matrix.json successfully.
Contribution ready for PR

You are now ready to make the Pull Request against the original SQuADDS_DB repository.


Just like that you have now learned how to contribute to an existing dataset configuration in the SQuADDS database! 🎉

In the next tutorial, we will learn how to create a new dataset configuration in the SQuADDS database.

License#

This code is a part of SQuADDS

Developed by Sadman Ahmed Shanto & Adhish Chakravorty

This tutorial is written by Sadman Ahmed Shanto & Adhish Chakravorty

© Copyright Sadman Ahmed Shanto & Eli Levenson-Falk 2023.

This code is licensed under the MIT License. You may obtain a copy of this license in the LICENSE.txt file in the root directory of this source tree.

Any modifications or derivative works of this code must retain thiscopyright notice, and modified files need to carry a notice indicatingthat they have been altered from the originals.