ResultSet module
- class remayn.result_set.ResultFolder(base_path)[source]
Stores a set of set of Result objects loaded from a directory. For each Result object, it stores the metadata of the experiment, such as the experiment info and the path where the result is stored. The predictions and targets are not loaded until needed.
- base_path
The path where the results are stored.
- Type:
Path
- load()[source]
Loads the experiment info of all the results from the base_path directory. Only the metadata of the experiments is loaded, while the ResultData is not loaded until needed.
It retrieves all the json files from base_path. Also, it checks that each json file has a corresponding pkl file. If the number of json files does not match the number of pkl files, a ValueError is raised.
- Raises:
ValueError – If the the pickle file for a given json file is not found.
Examples
>>> from remayn.result_set import ResultFolder >>> rf = ResultFolder("./results")
- class remayn.result_set.ResultSet(results: List[Result] | Set[Result] | Dict[str, Result])[source]
Stores a set of Result objects. For each Result object, it stores the metadata of the experiment, such as the experiment info and the path where the result is stored. The predictions and targets are not loaded until needed.
- results_
Dictionary that contains the config of the experiment as the key and the Result object as the value.
- Type:
Dict[str, Result]
- add(result: Result)[source]
Adds a Result object to the ResultSet. If the result already exists in the ResultSet, it will be replaced by the new one.
- Parameters:
result (Result) – The Result object to add.
- Raises:
TypeError – If the result parameter is not a Result.
- contains(result: str | dict | Result) bool[source]
Checks if the ResultSet contains the given Result.
- Parameters:
result (Union[str, dict, Result]) – The result that is searched in the ResultSet. It can be: - a string: the config dict of the result transformed to string, - a dict: the config dict of the result, - a Result object.
string (If the result is a) –
the (it is employed as the key to search in) –
dict (ResultSet. If the result is a) –
to (it is json sanitized and transformed) –
object (string to be used as the key. If the result is a Result) –
is (its config) –
key. (sanitized and transformed to string to be used as the) –
- Returns:
contains – Whether the ResultSet contains the given result.
- Return type:
bool
- Raises:
TypeError – If the result parameter is not a str, a dict or a Result.
- create_dataframe(config_columns: ~typing.List[str] = [], filter_fn: ~typing.Callable[[~remayn.result.result.Result], bool] = <function ResultSet.<lambda>>, metrics_fn: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray], ~typing.Dict[str, float]] = <function ResultSet.<lambda>>, include_train: bool = False, include_val: bool = False, best_params_columns: ~typing.List[str] = [], n_jobs: int = -1, config_columns_prefix: str = 'config_', best_params_columns_prefix: str = 'best_', raise_errors: ~typing.Literal['error', 'warning', 'ignore'] = 'error')[source]
Creates a pandas.DataFrame that contains all the results stored in this ResultSet. The DataFrame will contain the columns specified in config_columns, best_params_columns, and the metrics computed by metrics_fn. The metrics will be computed on the test set by default, but the train and validation metrics can be included using the include_train and include_val flags. If filter_fn parameter is provided, only the results that satisfy the condition will be included in the DataFrame.
- Parameters:
config_columns (List[str], optional, default=[]) – List of columns from the config to include in the dataframe.
filter_fn (Callable[[ResultData], bool], optional, default=lambda result: True) – Function to filter the results to include in the dataframe. If it returns True, the result row will be included. The function receives a single parameter which is the Result object being processed. It must return a boolean value.
metrics_fn (Callable[[np.ndarray, np.ndarray], Dict[str, float]]) – Function that computes the metrics from the targets and predictions. It receives two numpy arrays, the targets and the predictions, and returns a dictionary where the key is the name of the metric and the value is the value of the metric. The shape of the numpy arrays depend on the kind of data that is stored within the ResultData. While any shape can be valid, the implementation of the metrics function must be coherent with the data stored in the ResultData.
include_train (bool, optional, default=False) – Whether to include the metrics computed on the train set.
include_val (bool, optional, default=False) – Whether to include the metrics computed on the validation set.
best_params_columns (List[str], optional, default=[]) – List of columns from the best_params list to include in the dataframe.
n_jobs (int, optional, default=-1) – The number of jobs to run in parallel (must be > 0). If -1, all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used. joblib is used for parallel processing. If it is not installed, n_jobs will be set to 1 and a warning will be issued. The parallel backend is not specified within this function, so the user can set it using the parallel_backend of the joblib API.
config_columns_prefix (str, optional, default = "config_") – The prefix to add to the config columns. If ‘’, no prefix is added. Note that using an empty prefix can result in column name conflicts.
best_params_columns_prefix (str, optional, default = "best_") – The prefix to add to the best_params columns. If empty string, no prefix is added. Note that using an empty prefix can result in column name conflicts.
raise_errors (Literal["error", "warning", "ignore"], optional, default="error") – Defines the behaviour when an error occurs during the creation of a row. See remayn.result_set.utils.get_metric_columns_values for more details.
- Returns:
df – The dataframe with the results.
- Return type:
pandas dataframe
- Raises:
ValueError – If n_jobs is set to 0.
ValueError – If raise_errors is not one of “error”, “warning” or “ignore”.
- filter(filter_fn: Callable[[Result], bool]) ResultSet[source]
Create a copy of this ResultSet filtered by a function. The new ResultSet contains the results that satisfy the condition given by the filter function.
- Parameters:
filter_fn (Callable[[Result], bool]) – Function to filter the results. If it returns True, the result will be included in the new ResultSet. The function receives a single parameter which is the Result object being processed. It must return a boolean value.
- Returns:
results – A ResultSet that contains only the results that satisfy the condition.
- Return type:
- Raises:
TypeError – If the filter_fn parameter is not a callable.
- filter_by_config(config: dict) ResultSet[source]
Create a copy of this ResultSet filtered by config. The new ResultSet contains the results that match the given config.
- Parameters:
config (dict) – The config fields to filter by. To add a result to the filtered set, the result’s config must contain all the fields in the config parameter with the same values. For example, if config={“a”: 1, “b”: 2}, the result config must contain both fields with those values. However, the result config can contain additional fields not listed in the provided config dictionary.
- Returns:
results – A ResultSet that contains only the results that match the given config.
- Return type:
- Raises:
TypeError – If the config parameter is not a dict.
- remove(key: str | dict | Result)[source]
Removes a Result object identified by a key from the ResultSet.
- Parameters:
key (Union[str, dict, Result]) – The key of the Result to remove. It can be given as: - a string: the config dict of the result transformed to string, - a dict: the config dict of the result, - a Result object.
- Raises:
TypeError – If the key parameter is not a str, a dict or a Result.
KeyError – If the key is not found in the ResultSet.