Skip to content

rg.Dataset.records

Usage Examples

In most cases, you will not need to create a DatasetRecords object directly. Instead, you can access it via the Dataset object:

dataset.records

Adding records to a dataset

To add records to a dataset, use the add method. Records can be added as dictionaries or as Record objects. Single records can also be added as a dictionary or Record.

# Add records to a dataset
dataset.records.add(
    records=[
    {
        "question": "What is the capital of France?",  # 'question' matches the `rg.TextField` name
        "answer": "Paris" # 'answer' matches the `rg.TextQuestion` name
    },
    {
        "question": "What is the capital of Germany?",
        "answer": "Berlin"
    },
])

When adding records from a native datasource, a mapping can be provided to map the keys in the native data structure to the fields and questions in Argilla. The dot notation is used to access suggestions and responses in the records.

dataset.records.add(
    records=[
        {"input": "What is the capital of France?", "output": "Paris"},
        {"input": "What is the capital of Germany?", "output": "Berlin"},
    ],
    mapping={"input": "question", "output": "answer"}, # Maps 'input' to 'question' and 'output' to 'answer'
)

Iterating over records in a dataset

Dataset.records can be used to iterate over records in a dataset from the server. The records will be fetched in batches from the server::

for record in dataset.records:
    print(record)

# Fetch records with suggestions and responses
for record in dataset.records(with_suggestions=True, with_responses=True):
    print(record.suggestions)
    print(record.responses)

# Filter records by a query and fetch records with vectors
for record in dataset.records(query="capital", with_vectors=True):
    print(record.vectors)

Check out the rg.Record class reference for more information on the properties and methods available on a record and the rg.Query class reference for more information on the query syntax.

Updating records in a dataset

Records can also be updated using the id to identify the records to be updated:

# Add records to a dataset
dataset.records.add(
    records=[
        {
            "id": "1",
            "question": "What is the capital of France?",
            "answer": "F",
        },
        {
            "id": "2",
            "question": "What is the capital of Germany?",
            "answer": "Berlin"
        },
    ]
)

# Update records in a dataset
dataset.records.update(
    records=[
        {
            "id": "1",  # matches id used in `Dataset.records.add`
            "question": "What is the capital of France?",
            "answer": "Paris",
        }
    ]
)

Exporting records from a dataset

Records can also be exported from Dataset.records. Generic python exports include to_dict and to_list methods.

dataset.records.to_dict()
# {"text": ["Hello", "World"], "label": ["greeting", "greeting"]}

dataset.records.to_list()
# [{"text": "Hello", "label": "greeting"}, {"text": "World", "label": "greeting"}]

Class Reference

rg.Dataset.records

Bases: Iterable[Record], LoggingMixin

This class is used to work with records from a dataset and is accessed via Dataset.records. The responsibility of this class is to provide an interface to interact with records in a dataset, by adding, updating, fetching, querying, deleting, and exporting records.

Attributes:

Name Type Description
client Argilla

The Argilla client object.

dataset Dataset

The dataset object.

Source code in src/argilla_sdk/records/_dataset_records.py
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
class DatasetRecords(Iterable[Record], LoggingMixin):
    """This class is used to work with records from a dataset and is accessed via `Dataset.records`.
    The responsibility of this class is to provide an interface to interact with records in a dataset,
    by adding, updating, fetching, querying, deleting, and exporting records.

    Attributes:
        client (Argilla): The Argilla client object.
        dataset (Dataset): The dataset object.
    """

    _api: RecordsAPI

    DEFAULT_BATCH_SIZE = 256

    def __init__(self, client: "Argilla", dataset: "Dataset"):
        """Initializes a DatasetRecords object with a client and a dataset.
        Args:
            client: An Argilla client object.
            dataset: A Dataset object.
        """
        self.__client = client
        self.__dataset = dataset
        self._api = self.__client.api.records

    def __iter__(self):
        return DatasetRecordsIterator(self.__dataset, self.__client)

    def __call__(
        self,
        query: Optional[Union[str, Query]] = None,
        batch_size: Optional[int] = DEFAULT_BATCH_SIZE,
        start_offset: int = 0,
        with_suggestions: bool = True,
        with_responses: bool = True,
        with_vectors: Optional[Union[List, bool, str]] = None,
    ) -> DatasetRecordsIterator:
        """Returns an iterator over the records in the dataset on the server.

        Parameters:
            query: A string or a Query object to filter the records.
            batch_size: The number of records to fetch in each batch. The default is 256.
            start_offset: The offset from which to start fetching records. The default is 0.
            with_suggestions: Whether to include suggestions in the records. The default is True.
            with_responses: Whether to include responses in the records. The default is True.
            with_vectors: A list of vector names to include in the records. The default is None.
                If a list is provided, only the specified vectors will be included.
                If True is provided, all vectors will be included.

        Returns:
            An iterator over the records in the dataset on the server.

        """
        if query and isinstance(query, str):
            query = Query(query=query)

        if with_vectors:
            self._validate_vector_names(vector_names=with_vectors)

        return DatasetRecordsIterator(
            self.__dataset,
            self.__client,
            query=query,
            batch_size=batch_size,
            start_offset=start_offset,
            with_suggestions=with_suggestions,
            with_responses=with_responses,
            with_vectors=with_vectors,
        )

    def __repr__(self) -> str:
        return f"{self.__class__.__name__}({self.__dataset})"

    ############################
    # Public methods
    ############################

    def add(
        self,
        records: Union[dict, List[dict], Record, List[Record], HFDataset],
        mapping: Optional[Dict[str, str]] = None,
        user_id: Optional[UUID] = None,
        batch_size: int = DEFAULT_BATCH_SIZE,
    ) -> List[Record]:
        """
        Add new records to a dataset on the server.

        Parameters:
            records: A dictionary or a list of dictionaries representing the records
                     to be added to the dataset. Records are defined as dictionaries
                     with keys corresponding to the fields in the dataset schema.
            mapping: A dictionary that maps the keys in the records to the fields in the dataset schema.
            user_id: The user id to be associated with the records. If not provided, the current user id is used.
            batch_size: The number of records to send in each batch. The default is 256.

        Returns:
            A list of Record objects representing the added records.

        Examples:

        Add generic records to a dataset as dictionaries:

        """
        record_models = self._ingest_records(records=records, mapping=mapping, user_id=user_id or self.__client.me.id)
        batch_size = self._normalize_batch_size(
            batch_size=batch_size,
            records_length=len(record_models),
            max_value=self._api.MAX_RECORDS_PER_CREATE_BULK,
        )

        created_records = []
        for batch in range(0, len(record_models), batch_size):
            self.log(message=f"Sending records from {batch} to {batch + batch_size}.")
            batch_records = record_models[batch : batch + batch_size]
            models = self._api.bulk_create(dataset_id=self.__dataset.id, records=batch_records)
            created_records.extend([Record.from_model(model=model, dataset=self.__dataset) for model in models])

        self.log(
            message=f"Added {len(created_records)} records to dataset {self.__dataset.name}",
            level="info",
        )

        return created_records

    def update(
        self,
        records: Union[dict, List[dict], Record, List[Record], HFDataset],
        mapping: Optional[Dict[str, str]] = None,
        user_id: Optional[UUID] = None,
        batch_size: int = DEFAULT_BATCH_SIZE,
    ) -> List[Record]:
        """Update records in a dataset on the server using the provided records
            and matching based on the external_id or id.

        Parameters:
            records: A dictionary or a list of dictionaries representing the records
                     to be updated in the dataset. Records are defined as dictionaries
                     with keys corresponding to the fields in the dataset schema. Ids or
                     external_ids should be provided to identify the records to be updated.
            mapping: A dictionary that maps the keys in the records to the fields in the dataset schema.
            user_id: The user id to be associated with the records. If not provided, the current user id is used.
            batch_size: The number of records to send in each batch. The default is 256.

        Returns:
            A list of Record objects representing the updated records.

        """
        record_models = self._ingest_records(records=records, mapping=mapping, user_id=user_id or self.__client.me.id)
        batch_size = self._normalize_batch_size(
            batch_size=batch_size,
            records_length=len(record_models),
            max_value=self._api.MAX_RECORDS_PER_UPSERT_BULK,
        )

        created_or_updated = []
        records_updated = 0
        for batch in range(0, len(records), batch_size):
            self.log(message=f"Sending records from {batch} to {batch + batch_size}.")
            batch_records = record_models[batch : batch + batch_size]
            models, updated = self._api.bulk_upsert(dataset_id=self.__dataset.id, records=batch_records)
            created_or_updated.extend([Record.from_model(model=model, dataset=self.__dataset) for model in models])
            records_updated += updated

        records_created = len(created_or_updated) - records_updated
        self.log(
            message=f"Updated {records_updated} records and added {records_created} records to dataset {self.__dataset.name}",
            level="info",
        )

        return created_or_updated

    def to_dict(self, flatten: bool = False, orient: str = "names") -> Dict[str, Any]:
        """
        Return the records as a dictionary. This is a convenient shortcut for dataset.records(...).to_dict().

        Parameters:
            flatten (bool): The structure of the exported dictionary.
                - True: The record fields, metadata, suggestions and responses will be flattened.
                - False: The record fields, metadata, suggestions and responses will be nested.
            orient (str): The orientation of the exported dictionary.
                - "names": The keys of the dictionary will be the names of the fields, metadata, suggestions and responses.
                - "index": The keys of the dictionary will be the id of the records.
        Returns:
            A dictionary of records.

        """
        records = list(self(with_suggestions=True, with_responses=True))
        data = GenericIO.to_dict(records=records, flatten=flatten, orient=orient)
        return data

    def to_list(self, flatten: bool = False) -> List[Dict[str, Any]]:
        """
        Return the records as a list of dictionaries. This is a convenient shortcut for dataset.records(...).to_list().

        Parameters:
            flatten (bool): Whether to flatten the dictionary and use dot notation for nested keys like suggestions and responses.

        Returns:
            A list of dictionaries of records.
        """
        records = list(self(with_suggestions=True, with_responses=True))
        data = GenericIO.to_list(records=records, flatten=flatten)
        return data

    def to_json(self, path: Union[Path, str]) -> Path:
        """
        Export the records to a file on disk.

        Parameters:
            path (str): The path to the file to save the records.

        Returns:
            The path to the file where the records were saved.

        """
        records = list(self(with_suggestions=True, with_responses=True))
        return JsonIO.to_json(records=records, path=path)

    def from_json(self, path: Union[Path, str]) -> "DatasetRecords":
        """Creates a DatasetRecords object from a disk path to a JSON file.
            The JSON file should be defined by `DatasetRecords.to_json`.

        Args:
            path (str): The path to the file containing the records.

        Returns:
            DatasetRecords: The DatasetRecords object created from the disk path.

        """
        records = JsonIO._records_from_json(path=path)
        return self.update(records=records)
        return self

    def to_datasets(self) -> HFDataset:
        """
        Export the records to a HFDataset.

        Returns:
            The dataset containing the records.

        """
        records = list(self(with_suggestions=True, with_responses=True))
        return HFDatasetsIO.to_datasets(records=records)

    ############################
    # Private methods
    ############################

    def _ingest_records(
        self,
        records: Union[List[Dict[str, Any]], Dict[str, Any], List[Record], Record, HFDataset],
        mapping: Optional[Dict[str, str]] = None,
        user_id: Optional[UUID] = None,
    ) -> List[RecordModel]:
        if isinstance(records, (Record, dict)):
            records = [records]
        if HFDatasetsIO._is_hf_dataset(dataset=records):
            records = HFDatasetsIO._record_dicts_from_datasets(dataset=records)
        if all(map(lambda r: isinstance(r, dict), records)):
            # Records as flat dicts of values to be matched to questions as suggestion or response
            records = [self._infer_record_from_mapping(data=r, mapping=mapping, user_id=user_id) for r in records]  # type: ignore
        elif all(map(lambda r: isinstance(r, Record), records)):
            for record in records:
                record.dataset = self.__dataset
        else:
            raise ValueError(
                "Records should be a dictionary, a list of dictionaries, a Record instance, "
                "a list of Record instances, or `datasets.Dataset`."
            )
        return [record.api_model() for record in records]

    def _normalize_batch_size(self, batch_size: int, records_length, max_value: int):
        norm_batch_size = min(batch_size, records_length, max_value)

        if batch_size != norm_batch_size:
            self.log(
                message=f"The provided batch size {batch_size} was normalized. Using value {norm_batch_size}.",
                level="warning",
            )

        return norm_batch_size

    def _validate_vector_names(self, vector_names: Union[List[str], str]) -> None:
        if not isinstance(vector_names, list):
            vector_names = [vector_names]
        for vector_name in vector_names:
            if isinstance(vector_name, bool):
                continue
            if vector_name not in self.__dataset.schema:
                raise ValueError(f"Vector field {vector_name} not found in dataset schema.")

    def _infer_record_from_mapping(
        self,
        data: dict,
        mapping: Optional[Dict[str, str]] = None,
        user_id: Optional[UUID] = None,
    ) -> "Record":
        """Converts a mapped record dictionary to a Record object for use by the add or update methods.
        Args:
            dataset: The dataset object to which the record belongs.
            data: A dictionary representing the record.
            mapping: A dictionary mapping source data keys to Argilla fields, questions, and ids.
            user_id: The user id to associate with the record responses.
        Returns:
            A Record object.
        """
        fields: Dict[str, str] = {}
        responses: List[Response] = []
        record_id: Optional[str] = None
        suggestion_values = defaultdict(dict)
        vectors: List[Vector] = []
        metadata: Dict[str, MetadataValue] = {}

        schema = self.__dataset.schema

        for attribute, value in data.items():
            schema_item = schema.get(attribute)
            attribute_type = None
            sub_attribute = None

            # Map source data keys using the mapping
            if mapping and attribute in mapping:
                attribute_mapping = mapping.get(attribute)
                attribute_mapping = attribute_mapping.split(".")
                attribute = attribute_mapping[0]
                schema_item = schema.get(attribute)
                if len(attribute_mapping) > 1:
                    attribute_type = attribute_mapping[1]
                if len(attribute_mapping) > 2:
                    sub_attribute = attribute_mapping[2]
            elif schema_item is mapping is None and attribute != "id":
                warnings.warn(
                    message=f"""Record attribute {attribute} is not in the schema so skipping.
                        Define a mapping to map source data fields to Argilla Fields, Questions, and ids
                        """
                )
                continue

            if attribute == "id":
                record_id = value
                continue

            # Add suggestion values to the suggestions
            if attribute_type == "suggestion":
                if sub_attribute in ["score", "agent"]:
                    suggestion_values[attribute][sub_attribute] = value

                elif sub_attribute is None:
                    suggestion_values[attribute].update(
                        {"value": value, "question_name": attribute, "question_id": schema_item.id}
                    )
                else:
                    warnings.warn(
                        message=f"Record attribute {sub_attribute} is not a valid suggestion sub_attribute so skipping."
                    )
                continue

            # Assign the value to question, field, or response based on schema item
            if isinstance(schema_item, TextField):
                fields[attribute] = value
            elif isinstance(schema_item, QuestionPropertyBase) and attribute_type == "response":
                responses.append(Response(question_name=attribute, value=value, user_id=user_id))
            elif isinstance(schema_item, QuestionPropertyBase) and attribute_type is None:
                suggestion_values[attribute].update(
                    {"value": value, "question_name": attribute, "question_id": schema_item.id}
                )
            elif isinstance(schema_item, VectorField):
                vectors.append(Vector(name=attribute, values=value))
            elif isinstance(schema_item, MetadataPropertyBase):
                metadata[attribute] = value
            else:
                warnings.warn(message=f"Record attribute {attribute} is not in the schema or mapping so skipping.")
                continue

        suggestions = [Suggestion(**suggestion_dict) for suggestion_dict in suggestion_values.values()]

        return Record(
            id=record_id,
            fields=fields,
            suggestions=suggestions,
            responses=responses,
            vectors=vectors,
            metadata=metadata,
            _dataset=self.__dataset,
        )

__call__(query=None, batch_size=DEFAULT_BATCH_SIZE, start_offset=0, with_suggestions=True, with_responses=True, with_vectors=None)

Returns an iterator over the records in the dataset on the server.

Parameters:

Name Type Description Default
query Optional[Union[str, Query]]

A string or a Query object to filter the records.

None
batch_size Optional[int]

The number of records to fetch in each batch. The default is 256.

DEFAULT_BATCH_SIZE
start_offset int

The offset from which to start fetching records. The default is 0.

0
with_suggestions bool

Whether to include suggestions in the records. The default is True.

True
with_responses bool

Whether to include responses in the records. The default is True.

True
with_vectors Optional[Union[List, bool, str]]

A list of vector names to include in the records. The default is None. If a list is provided, only the specified vectors will be included. If True is provided, all vectors will be included.

None

Returns:

Type Description
DatasetRecordsIterator

An iterator over the records in the dataset on the server.

Source code in src/argilla_sdk/records/_dataset_records.py
def __call__(
    self,
    query: Optional[Union[str, Query]] = None,
    batch_size: Optional[int] = DEFAULT_BATCH_SIZE,
    start_offset: int = 0,
    with_suggestions: bool = True,
    with_responses: bool = True,
    with_vectors: Optional[Union[List, bool, str]] = None,
) -> DatasetRecordsIterator:
    """Returns an iterator over the records in the dataset on the server.

    Parameters:
        query: A string or a Query object to filter the records.
        batch_size: The number of records to fetch in each batch. The default is 256.
        start_offset: The offset from which to start fetching records. The default is 0.
        with_suggestions: Whether to include suggestions in the records. The default is True.
        with_responses: Whether to include responses in the records. The default is True.
        with_vectors: A list of vector names to include in the records. The default is None.
            If a list is provided, only the specified vectors will be included.
            If True is provided, all vectors will be included.

    Returns:
        An iterator over the records in the dataset on the server.

    """
    if query and isinstance(query, str):
        query = Query(query=query)

    if with_vectors:
        self._validate_vector_names(vector_names=with_vectors)

    return DatasetRecordsIterator(
        self.__dataset,
        self.__client,
        query=query,
        batch_size=batch_size,
        start_offset=start_offset,
        with_suggestions=with_suggestions,
        with_responses=with_responses,
        with_vectors=with_vectors,
    )

__init__(client, dataset)

Initializes a DatasetRecords object with a client and a dataset. Args: client: An Argilla client object. dataset: A Dataset object.

Source code in src/argilla_sdk/records/_dataset_records.py
def __init__(self, client: "Argilla", dataset: "Dataset"):
    """Initializes a DatasetRecords object with a client and a dataset.
    Args:
        client: An Argilla client object.
        dataset: A Dataset object.
    """
    self.__client = client
    self.__dataset = dataset
    self._api = self.__client.api.records

add(records, mapping=None, user_id=None, batch_size=DEFAULT_BATCH_SIZE)

Add new records to a dataset on the server.

Parameters:

Name Type Description Default
records Union[dict, List[dict], Record, List[Record], HFDataset]

A dictionary or a list of dictionaries representing the records to be added to the dataset. Records are defined as dictionaries with keys corresponding to the fields in the dataset schema.

required
mapping Optional[Dict[str, str]]

A dictionary that maps the keys in the records to the fields in the dataset schema.

None
user_id Optional[UUID]

The user id to be associated with the records. If not provided, the current user id is used.

None
batch_size int

The number of records to send in each batch. The default is 256.

DEFAULT_BATCH_SIZE

Returns:

Type Description
List[Record]

A list of Record objects representing the added records.

Examples:

Add generic records to a dataset as dictionaries:

Source code in src/argilla_sdk/records/_dataset_records.py
def add(
    self,
    records: Union[dict, List[dict], Record, List[Record], HFDataset],
    mapping: Optional[Dict[str, str]] = None,
    user_id: Optional[UUID] = None,
    batch_size: int = DEFAULT_BATCH_SIZE,
) -> List[Record]:
    """
    Add new records to a dataset on the server.

    Parameters:
        records: A dictionary or a list of dictionaries representing the records
                 to be added to the dataset. Records are defined as dictionaries
                 with keys corresponding to the fields in the dataset schema.
        mapping: A dictionary that maps the keys in the records to the fields in the dataset schema.
        user_id: The user id to be associated with the records. If not provided, the current user id is used.
        batch_size: The number of records to send in each batch. The default is 256.

    Returns:
        A list of Record objects representing the added records.

    Examples:

    Add generic records to a dataset as dictionaries:

    """
    record_models = self._ingest_records(records=records, mapping=mapping, user_id=user_id or self.__client.me.id)
    batch_size = self._normalize_batch_size(
        batch_size=batch_size,
        records_length=len(record_models),
        max_value=self._api.MAX_RECORDS_PER_CREATE_BULK,
    )

    created_records = []
    for batch in range(0, len(record_models), batch_size):
        self.log(message=f"Sending records from {batch} to {batch + batch_size}.")
        batch_records = record_models[batch : batch + batch_size]
        models = self._api.bulk_create(dataset_id=self.__dataset.id, records=batch_records)
        created_records.extend([Record.from_model(model=model, dataset=self.__dataset) for model in models])

    self.log(
        message=f"Added {len(created_records)} records to dataset {self.__dataset.name}",
        level="info",
    )

    return created_records

from_json(path)

Creates a DatasetRecords object from a disk path to a JSON file. The JSON file should be defined by DatasetRecords.to_json.

Parameters:

Name Type Description Default
path str

The path to the file containing the records.

required

Returns:

Name Type Description
DatasetRecords DatasetRecords

The DatasetRecords object created from the disk path.

Source code in src/argilla_sdk/records/_dataset_records.py
def from_json(self, path: Union[Path, str]) -> "DatasetRecords":
    """Creates a DatasetRecords object from a disk path to a JSON file.
        The JSON file should be defined by `DatasetRecords.to_json`.

    Args:
        path (str): The path to the file containing the records.

    Returns:
        DatasetRecords: The DatasetRecords object created from the disk path.

    """
    records = JsonIO._records_from_json(path=path)
    return self.update(records=records)
    return self

to_datasets()

Export the records to a HFDataset.

Returns:

Type Description
HFDataset

The dataset containing the records.

Source code in src/argilla_sdk/records/_dataset_records.py
def to_datasets(self) -> HFDataset:
    """
    Export the records to a HFDataset.

    Returns:
        The dataset containing the records.

    """
    records = list(self(with_suggestions=True, with_responses=True))
    return HFDatasetsIO.to_datasets(records=records)

to_dict(flatten=False, orient='names')

Return the records as a dictionary. This is a convenient shortcut for dataset.records(...).to_dict().

Parameters:

Name Type Description Default
flatten bool

The structure of the exported dictionary. - True: The record fields, metadata, suggestions and responses will be flattened. - False: The record fields, metadata, suggestions and responses will be nested.

False
orient str

The orientation of the exported dictionary. - "names": The keys of the dictionary will be the names of the fields, metadata, suggestions and responses. - "index": The keys of the dictionary will be the id of the records.

'names'

Returns: A dictionary of records.

Source code in src/argilla_sdk/records/_dataset_records.py
def to_dict(self, flatten: bool = False, orient: str = "names") -> Dict[str, Any]:
    """
    Return the records as a dictionary. This is a convenient shortcut for dataset.records(...).to_dict().

    Parameters:
        flatten (bool): The structure of the exported dictionary.
            - True: The record fields, metadata, suggestions and responses will be flattened.
            - False: The record fields, metadata, suggestions and responses will be nested.
        orient (str): The orientation of the exported dictionary.
            - "names": The keys of the dictionary will be the names of the fields, metadata, suggestions and responses.
            - "index": The keys of the dictionary will be the id of the records.
    Returns:
        A dictionary of records.

    """
    records = list(self(with_suggestions=True, with_responses=True))
    data = GenericIO.to_dict(records=records, flatten=flatten, orient=orient)
    return data

to_json(path)

Export the records to a file on disk.

Parameters:

Name Type Description Default
path str

The path to the file to save the records.

required

Returns:

Type Description
Path

The path to the file where the records were saved.

Source code in src/argilla_sdk/records/_dataset_records.py
def to_json(self, path: Union[Path, str]) -> Path:
    """
    Export the records to a file on disk.

    Parameters:
        path (str): The path to the file to save the records.

    Returns:
        The path to the file where the records were saved.

    """
    records = list(self(with_suggestions=True, with_responses=True))
    return JsonIO.to_json(records=records, path=path)

to_list(flatten=False)

Return the records as a list of dictionaries. This is a convenient shortcut for dataset.records(...).to_list().

Parameters:

Name Type Description Default
flatten bool

Whether to flatten the dictionary and use dot notation for nested keys like suggestions and responses.

False

Returns:

Type Description
List[Dict[str, Any]]

A list of dictionaries of records.

Source code in src/argilla_sdk/records/_dataset_records.py
def to_list(self, flatten: bool = False) -> List[Dict[str, Any]]:
    """
    Return the records as a list of dictionaries. This is a convenient shortcut for dataset.records(...).to_list().

    Parameters:
        flatten (bool): Whether to flatten the dictionary and use dot notation for nested keys like suggestions and responses.

    Returns:
        A list of dictionaries of records.
    """
    records = list(self(with_suggestions=True, with_responses=True))
    data = GenericIO.to_list(records=records, flatten=flatten)
    return data

update(records, mapping=None, user_id=None, batch_size=DEFAULT_BATCH_SIZE)

Update records in a dataset on the server using the provided records and matching based on the external_id or id.

Parameters:

Name Type Description Default
records Union[dict, List[dict], Record, List[Record], HFDataset]

A dictionary or a list of dictionaries representing the records to be updated in the dataset. Records are defined as dictionaries with keys corresponding to the fields in the dataset schema. Ids or external_ids should be provided to identify the records to be updated.

required
mapping Optional[Dict[str, str]]

A dictionary that maps the keys in the records to the fields in the dataset schema.

None
user_id Optional[UUID]

The user id to be associated with the records. If not provided, the current user id is used.

None
batch_size int

The number of records to send in each batch. The default is 256.

DEFAULT_BATCH_SIZE

Returns:

Type Description
List[Record]

A list of Record objects representing the updated records.

Source code in src/argilla_sdk/records/_dataset_records.py
def update(
    self,
    records: Union[dict, List[dict], Record, List[Record], HFDataset],
    mapping: Optional[Dict[str, str]] = None,
    user_id: Optional[UUID] = None,
    batch_size: int = DEFAULT_BATCH_SIZE,
) -> List[Record]:
    """Update records in a dataset on the server using the provided records
        and matching based on the external_id or id.

    Parameters:
        records: A dictionary or a list of dictionaries representing the records
                 to be updated in the dataset. Records are defined as dictionaries
                 with keys corresponding to the fields in the dataset schema. Ids or
                 external_ids should be provided to identify the records to be updated.
        mapping: A dictionary that maps the keys in the records to the fields in the dataset schema.
        user_id: The user id to be associated with the records. If not provided, the current user id is used.
        batch_size: The number of records to send in each batch. The default is 256.

    Returns:
        A list of Record objects representing the updated records.

    """
    record_models = self._ingest_records(records=records, mapping=mapping, user_id=user_id or self.__client.me.id)
    batch_size = self._normalize_batch_size(
        batch_size=batch_size,
        records_length=len(record_models),
        max_value=self._api.MAX_RECORDS_PER_UPSERT_BULK,
    )

    created_or_updated = []
    records_updated = 0
    for batch in range(0, len(records), batch_size):
        self.log(message=f"Sending records from {batch} to {batch + batch_size}.")
        batch_records = record_models[batch : batch + batch_size]
        models, updated = self._api.bulk_upsert(dataset_id=self.__dataset.id, records=batch_records)
        created_or_updated.extend([Record.from_model(model=model, dataset=self.__dataset) for model in models])
        records_updated += updated

    records_created = len(created_or_updated) - records_updated
    self.log(
        message=f"Updated {records_updated} records and added {records_created} records to dataset {self.__dataset.name}",
        level="info",
    )

    return created_or_updated