Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FunctionDataSource does not allow function with 3 positional arguments thus shuffling does not work #307

Open
marton-avrios opened this issue Aug 23, 2022 · 2 comments

Comments

@marton-avrios
Copy link

During creation it checks if function has only 2 positional arguments. For shuffling to be used it should also accept a third argument, seed or seeds. Otherwise an exception is thrown when trying to pass shuffle=True to get_dataset().

_validate_args(dataset_fn, ["split", "shuffle_files"])

Also it only allows seed and not seeds later. But this never comes into effect since the whole things fails during creation.

_validate_args(self._dataset_fn, ["split", "shuffle_files", "seed"])

@marton-avrios marton-avrios changed the title FunctionDataSource does not allow function with 3 positional arguments shuffle does not work FunctionDataSource does not allow function with 3 positional arguments thus shuffling does not work Aug 23, 2022
@gauravmishra
Copy link
Collaborator

gauravmishra commented Dec 3, 2022

Hi @marton-avrios , this is working as intended iiuc. The first validation checks that the dataset_fn should have at least the split and shuffle_files as args, since seed may not be an arg. The second validation is triggered only when the user passes a seed when loading the dataset, meaning that the dataset_fn is expected to have a "seed" arg. "seeds" isn't supported. If your dataset_fn needs multiple seeds, then you can create new ones from the initial seed,

@gauravmishra
Copy link
Collaborator

Maybe the fn needs a better name and documentation, but it doesn't validate that the fn has exactly the same args as expected_pos_args, but that the first len(expected_pos_args) are exactly the same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants