squadds.database package#
Submodules#
squadds.database.HuggingFace module#
- squadds.database.HuggingFace.add_column_to_dataset(dataset, column_name, column_data)[source]#
Add a new column to a dataset.
- Parameters:
dataset (Dataset) β Hugging Face dataset to which you want to add a column.
column_name (str) β Name of the new column.
column_data (list) β Data for the new column.
- Returns:
Dataset with the new column.
- Return type:
Dataset
- squadds.database.HuggingFace.add_row_to_dataset(dataset, row_data)[source]#
Add a new row to a dataset.
- Parameters:
dataset (Dataset) β The Hugging Face dataset to which you want to add a row.
row_data (dict) β The row data in dictionary format.
- Returns:
Dataset with the new row added.
- Return type:
Dataset
- squadds.database.HuggingFace.create_PR(repo_id, branch_name, title, description)[source]#
Create a Pull Request (PR) on Hugging Face Hub.
- Parameters:
repo_id (str) β The repo ID (namespace/repo) where the PR will be created.
branch_name (str) β The branch name where the changes are made.
title (str) β The title of the PR.
description (str) β A description of the changes made in the PR.
- Returns:
Information about the created PR.
- Return type:
dict
- squadds.database.HuggingFace.filter_dataset(dataset, filter_fn)[source]#
Filter a dataset based on a custom condition.
- Parameters:
dataset (Dataset) β Hugging Face dataset to filter.
filter_fn (function) β Function that returns True or False for filtering.
- Returns:
Filtered dataset.
- Return type:
Dataset
- squadds.database.HuggingFace.fork_dataset(repo_id, dataset_name, new_dataset_name, private=True)[source]#
Fork a dataset from Hugging Face Hub.
- Parameters:
repo_id (str) β The repo ID (namespace/repo) of the dataset to fork.
dataset_name (str) β Name of the dataset to fork.
new_dataset_name (str) β Name of the new dataset.
private (bool) β Whether the new dataset should be private or public.
- Returns:
None
- squadds.database.HuggingFace.load_hf_dataset(dataset_name, config=None)[source]#
Load a dataset from Hugging Face Hub.
- Parameters:
dataset_name (str) β The name or path of the dataset on the Hugging Face Hub.
config (str) β Specific configuration or version of the dataset.
- Returns:
Loaded dataset.
- Return type:
Dataset or DatasetDict
- squadds.database.HuggingFace.login_to_huggingface()[source]#
Log into Hugging Face using an API token from environment variables.
- squadds.database.HuggingFace.merge_datasets(dataset1, dataset2)[source]#
Merge two datasets into one.
- Parameters:
dataset1 (Dataset) β First dataset.
dataset2 (Dataset) β Second dataset.
- Returns:
Merged dataset.
- Return type:
Dataset
- squadds.database.HuggingFace.remove_column_from_dataset(dataset, column_name)[source]#
Remove a column from a dataset.
- Parameters:
dataset (Dataset) β Hugging Face dataset from which you want to remove a column.
column_name (str) β Name of the column to remove.
- Returns:
Dataset with the column removed.
- Return type:
Dataset
- squadds.database.HuggingFace.remove_row_from_dataset(dataset, row_index)[source]#
Remove a row from a dataset by index.
- Parameters:
dataset (Dataset) β Hugging Face dataset from which you want to remove a row.
row_index (int) β Index of the row to remove.
- Returns:
Dataset with the row removed.
- Return type:
Dataset
- squadds.database.HuggingFace.save_dataset_to_hf(dataset, repo_id, dataset_name, private=True)[source]#
Push a dataset to Hugging Face Hub.
- Parameters:
dataset (Dataset) β The dataset to push to Hugging Face Hub.
repo_id (str) β The repo ID (namespace/repo) on Hugging Face Hub.
dataset_name (str) β Name of the dataset on Hugging Face Hub.
private (bool) β Whether the dataset should be private or public.
- Returns:
None
- squadds.database.HuggingFace.update_column_in_dataset(dataset, column_name, new_column_data)[source]#
Update a specific column in the dataset.
- Parameters:
dataset (Dataset) β Hugging Face dataset to update.
column_name (str) β Name of the column to update.
new_column_data (list) β List of new data to replace the existing column.
- Returns:
Updated dataset.
- Return type:
Dataset
- squadds.database.HuggingFace.update_row_in_dataset(dataset, row_index, new_row_data)[source]#
Update an existing row in a dataset by index.
- Parameters:
dataset (Dataset) β Hugging Face dataset to update.
row_index (int) β Index of the row to update.
new_row_data (dict) β The new data for the row.
- Returns:
Updated dataset.
- Return type:
Dataset
- squadds.database.HuggingFace.view_column_in_dataset(dataset, column_name, num_values)[source]#
View a specific column in the dataset by its name.
- Parameters:
dataset (Dataset) β Hugging Face dataset.
column_name (str) β Name of the column to view.
- Returns:
Data from the specified column.
- Return type:
list
squadds.database.abstract_upload_data module#
squadds.database.checker module#
squadds.database.config module#
Helper methods to create config files
squadds.database.contributor module#
- class squadds.database.contributor.ExistingConfigData(config='')[source]#
Bases:
object
Represents an existing configuration data object.
- config#
The name of the configuration.
- Type:
str
- sim_results#
A dictionary containing simulation results.
- Type:
dict
- design#
A dictionary containing design options and the design tool.
- Type:
dict
- sim_options#
A dictionary containing simulation setup options.
- Type:
dict
- units#
A set containing the units used in the simulation results.
- Type:
set
- notes#
A dictionary containing additional notes.
- Type:
dict
- ref_entry#
A dictionary containing the reference entry.
- Type:
dict
- contributor#
A dictionary containing contributor information.
- Type:
dict
- entry#
A dictionary containing the contribution data.
- Type:
dict
- local_repo_path#
The local repository path.
- Type:
str
- sweep_data#
A list containing sweep data.
- Type:
list
- __set_contributor_info()#
Sets the contributor information.
- _validate_content_v0()[source]#
Validates the content of the contribution against the dataset schema.
- add_design(design)[source]#
Adds a design to the contribution.
- Parameters:
design (dict) β A dictionary containing design options and the design tool.
- add_design_v0(design)[source]#
Adds a design to the contribution.
- Parameters:
design (dict) β A dictionary containing design options and the design tool.
- add_notes(notes={})[source]#
Adds notes to the contribution.
- Parameters:
notes (dict) β A dictionary containing notes.
- add_sim_result(result_name, result_value, unit)[source]#
Add a simulation result to the contributor.
- Parameters:
result_name (str) β The name of the simulation result.
result_value (float) β The value of the simulation result.
unit (str) β The unit of measurement for the simulation result.
- Returns:
None
- add_sim_setup(sim_setup)[source]#
Adds simulation setup options to the contribution.
- Parameters:
sim_setup (dict) β A dictionary containing simulation setup options that match the configs schema.
- contribute(path_to_repo, is_sweep=False)[source]#
Contributes to the repository by updating the local repo, updating the database, and uploading to HF.
- Parameters:
path_to_repo (str) β The path to the repository.
is_sweep (bool) β True if the contribution is a sweep, False otherwise.
- Returns:
None
- from_json(json_file, is_sweep=False)[source]#
Loads a contribution from a JSON file.
- Parameters:
json_file (str) β The path to the JSON file.
is_sweep (bool) β True if the contribution is a sweep, False otherwise.
- get_config_schema()[source]#
Connects to the repository with the given configuration name. Chooses the first entry from the config dataset and extracts the schema.
- Returns:
A dictionary containing the schema for the given configuration name.
- get_contributor_info()[source]#
Returns the contributor information.
- Returns:
The contributor information.
- Return type:
str
- property invalidate#
Invalidates the contributor by setting the isValidated flag to False.
- property is_validated#
Returns True if the contribution is validated, False otherwise.
- Returns:
True if the contribution is validated, False otherwise.
- Return type:
bool
- show_config_schema()[source]#
Connects to the repository with the given configuration name. Chooses the first entry from the config dataset and extracts the schema.
- Returns:
None
- to_dict()[source]#
Converts the Contributor object to a dictionary.
- Returns:
A dictionary representation of the Contributor object.
- Return type:
dict
- update_db(path_to_repo, is_sweep=False)[source]#
Updates the local repository with the validated data.
- Parameters:
path_to_repo (str) β The path to the local repository.
- Raises:
ValueError β If the data has not been validated.
- update_repo(path_to_repo)[source]#
Updates the repository at the specified path.
- Parameters:
path_to_repo (str) β The path to the repository.
- Raises:
subprocess.CalledProcessError β If the git commands fail.
- upload_to_HF(path_to_repo)[source]#
Uploads validated data to the specified repository.
- Parameters:
path_to_repo (str) β The path to the repository.
- Raises:
ValueError β If the data has not been validated.
subprocess.CalledProcessError β If the git commands fail.
- Returns:
None
- validate()[source]#
Validates the contribution by performing various checks.
- Raises:
Exception β If any validation check fails.
- validate_content(data)[source]#
- Parameters:
data (dict) β The data to be validated.
Validates the content of the contribution against the dataset schema.
- validate_structure(actual_structure)[source]#
Validates the structure of the contributor object.
- Parameters:
actual_structure (dict) β The actual structure of the contributor object.
- Raises:
ValueError β If any required key or sub-key is missing in the actual structure.
squadds.database.contributor_HF module#
- class squadds.database.contributor_HF.Contribute(data_files)[source]#
Bases:
object
Class representing a contributor for dataset creation and upload.
- dataset_files#
List of dataset file paths.
- Type:
list
- institute#
Institution name.
- Type:
str
- pi_name#
PI (Principal Investigator) name.
- Type:
str
- api#
Hugging Face API object.
- Type:
HfApi
- token#
Hugging Face API token.
- Type:
str
- dataset_name#
Name of the dataset.
- Type:
str
- dataset_files#
List of dataset file paths.
- Type:
list
- dataset_link#
Link to the dataset.
- Type:
str
- check_for_api_key()[source]#
Checks for the presence of Hugging Face API key.
- Returns:
Hugging Face API object. token (str): Hugging Face API token.
- Return type:
api (HfApi)
- Raises:
ValueError β If Hugging Face token is not found.
- create_dataset_name(components, data_type, data_nature, data_source, date=None)[source]#
Creates a unique name for the dataset.
- Parameters:
components (list) β List of components.
data_type (str) β Type of the data.
data_nature (str) β Nature of the data.
data_source (str) β Source of the data.
date (str, optional) β Date of the dataset creation. Defaults to None.
- Returns:
Unique name for the dataset.
- Return type:
str
- create_dataset_repository(components, data_type, data_nature, data_source)[source]#
Creates a repository for the dataset on HuggingFace (if it doesnβt exist).
- Parameters:
components (list) β List of components.
data_type (str) β Type of the data.
data_nature (str) β Nature of the data.
data_source (str) β Source of the data.
- get_dataset_link()[source]#
Retrieves the link to the dataset.
- Returns:
Link to the dataset.
- Return type:
str
- upload_dataset()[source]#
Uploads the dataset to Hugging Face.
- Raises:
NotImplementedError β If dataset upload is not implemented.
- upload_dataset_no_validation(components, data_type, data_nature, data_source, files, date=None)[source]#
Uploads the dataset to HuggingFace without validation.
- Parameters:
components (list) β List of components.
data_type (str) β Type of the data.
data_nature (str) β Nature of the data.
data_source (str) β Source of the data.
files (list) β List of file paths.
date (str, optional) β Date of the dataset creation. Defaults to None.
squadds.database.github module#
- squadds.database.github.append_to_json(data, new_entry)[source]#
Append the new entry to the JSON data, ensuring the format is maintained.
Parameters: - data (dict): Existing JSON data. - new_entry (dict): New entry to add to the JSON data.
Returns: - dict: Updated JSON data.
- squadds.database.github.clone_repository(repo_url, clone_dir)[source]#
Clone the given repository into the specified directory.
Parameters: - repo_url (str): URL of the repository to clone. - clone_dir (str): Path to the directory where the repo should be cloned.
Returns: - git.Repo: The cloned Git repository object.
- squadds.database.github.commit_changes(repo, file_path, commit_message)[source]#
Commit changes to the specified file in the repository.
Parameters: - repo (git.Repo): The Git repository object. - file_path (str): Path to the file to commit. - commit_message (str): Commit message.
Returns: - str: The commit hash if successful.
- squadds.database.github.contribute_measured_data(new_entry, pr_title='PR For Contributing New Data', pr_body='This PR is for contributing new data to the SQuADDS Measured Devices Database.')[source]#
Update the JSON file in the given repository by appending a new entry and committing the changes.
Parameters: - new_entry (dict): New entry to append to the JSON file. - pr_title (str): The title of the pull request. - pr_body (str): The body description of the pull request.
Returns: - str: Commit hash if successful, None otherwise.
- squadds.database.github.create_pull_request(forked_repo_name, branch_name, pr_title, pr_body, github_token)[source]#
Creates a pull request from the specified branch in the forked repository to the original repository.
Parameters: - forked_repo_name (str): The full name of the forked repository (e.g., βyour_username/repo_nameβ). - branch_name (str): The name of the branch from which to create the PR. - pr_title (str): The title of the pull request. - pr_body (str): The body description of the pull request. - github_token (str): GitHub Personal Access Token for authentication.
Returns: - str: URL of the created pull request if successful, None otherwise.
- squadds.database.github.fork_repository(github_token)[source]#
Forks the specified GitHub repository to the authenticated userβs account.
github_token (str): GitHub Personal Access Token with appropriate permissions.
Returns: - str: URL of the forked repository if successful, None otherwise.
- squadds.database.github.get_github_username(github_token)[source]#
Get the GitHub username associated with the provided token.
Parameters: - github_token (str): GitHub Personal Access Token.
Returns: - str: GitHub username if successful, None otherwise.
- squadds.database.github.login_to_github()[source]#
Logs in to GitHub using a token stored in environment variables.
- squadds.database.github.push_changes(repo, branch_name='main', github_token=None)[source]#
Push the committed changes to the remote repository.
Parameters: - repo (git.Repo): The Git repository object. - branch_name (str): The name of the branch to push to. - github_token (str): GitHub Personal Access Token (optional).
Returns: - bool: True if push is successful, False otherwise.
squadds.database.new_contribution module#
- class squadds.database.new_contribution.ConfigMaker(component, component_name, data_type)[source]#
Bases:
object
- set_schema(ref_file=None, interactive=True)[source]#
# TODO: Implement create_metadata method (both interactive and non-interactive) for required fields if interactive:
self.set_design_fields() self.set_sim_options_fields() self.set_sim_results_fields() self.set_other_fields()
- else:
self.set_fields(ref_file)
squadds.database.utils module#
Utilities for the database package.
- squadds.database.utils.copy_files_to_new_location(data_path, new_path)[source]#
Copy files from the given data path to the new location.
- Parameters:
data_path (str) β The path to the directory containing the files to be copied.
new_path (str) β The path to the directory where the files will be copied to.
- Returns:
None
- Raises:
None β
- squadds.database.utils.create_contributor_info()[source]#
Prompt the user for information and update the .env file.
This function prompts the user to enter information such as institution name, group name, PI name, user name, and an optional contrib_misc. It then validates the input and updates the corresponding fields in the .env file. If the fields already exist in the .env file, the function prompts the user to confirm whether to overwrite the existing values.
- Raises:
ValueError β If any of the input fields are empty (except for contrib_misc).