Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix IndexOutOfBoundsException when storage locations are empty #16439

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

abhishekrb19
Copy link
Contributor

Storage locations can be empty if not configured on a server. If left unconfigured, this can cause an IndexOutOfBoundsException when segments are assigned to a server by the coordinator.

org.apache.druid.server.coordination.SegmentLoadDropHandler - Failed to load segment for dataSource: {exceptionType=java.lang.IndexOutOfBoundsException, exceptionMessage=Index: 0, class=org.apache.druid.server.coordination.SegmentLoadDropHandler, segment=DataSegment{binaryVersion=9, id=foo_2024-03-01T00:00:00.000Z_2024-03-02T00:00:00.000Z_2024-04-30T01:56:26.606Z,.., size=1063}}
java.lang.IndexOutOfBoundsException: Index: 0
	at java.base/java.util.Collections$EmptyList.get(Collections.java:4586) ~[?:?]
	at org.apache.druid.segment.loading.SegmentLoaderConfig.getInfoDir(SegmentLoaderConfig.java:123) ~[druid-server-31.0.0-SNAPSHOT.jar:31.0.0-SNAPSHOT]
	at org.apache.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:290) ~[druid-server-31.0.0-SNAPSHOT.jar:31.0.0-SNAPSHOT]
	at org.apache.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:263) ~[druid-server-31.0.0-SNAPSHOT.jar:31.0.0-SNAPSHOT]
	at org.apache.druid.server.coordination.SegmentLoadDropHandler.addSegment(SegmentLoadDropHandler.java:340) ~[druid-server-31.0.0-SNAPSHOT.jar:31.0.0-SNAPSHOT]
	at org.apache.druid.server.coordination.SegmentLoadDropHandler$1.lambda$addSegment$1(SegmentLoadDropHandler.java:569) ~[druid-server-31.0.0-SNAPSHOT.jar:31.0.0-SNAPSHOT]
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]

Changes:

  • Add a check to fail if locations.isEmpty() in getInfoDir(). I'm not sure there exists a meaningful default we can assume when locations is not configured.
  • Add a unit test that fails with the above exception without this patch.
  • Add equals() and hashCode() implementations in SegmentChangeStatus to validate equality.
  • Change Preconditions.checkNotNull() to a DruidException.defensive().

This PR has:

  • been self-reviewed.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

storage locations can be empty if not configured. This can cause
IndexOutOfBoundsException on nodes where segment assignment can happen.
@abhishekrb19 abhishekrb19 requested a review from kfaraz May 13, 2024 14:33
Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this, @abhishekrb19 ! The docs do mention that druid.segmentCache.locations cannot be null or empty but there doesn't seem to be an existing validation in place.

Just to double check, did you actually run into this issue or did you discover it while reading through the code?

Comment on lines +79 to +95
public boolean equals(Object o)
{
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()) {
return false;
}
SegmentChangeStatus that = (SegmentChangeStatus) o;
return state == that.state && Objects.equals(failureCause, that.failureCause);
}

@Override
public int hashCode()
{
return Objects.hash(state, failureCause);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would generally prefer to add equals/hashCode when it is needed for the actual implementation or if the class has several fields and it makes testing easier.

Since we currently have only 2 fields in this class, we can probably skip it for the time being. Especially since there is only one test which is using this.

if (locations.isEmpty()) {
throw DruidException.forPersona(DruidException.Persona.OPERATOR)
.ofCategory(DruidException.Category.NOT_FOUND)
.build("storage locations are empty. At least one storage path must be specified "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.build("storage locations are empty. At least one storage path must be specified "
.build("Storage locations are empty. At least one storage path must be specified "

@@ -54,7 +55,10 @@ private SegmentChangeStatus(
@JsonProperty("failureCause") @Nullable String failureCause
)
{
this.state = Preconditions.checkNotNull(state, "state must be non-null");
if (state == null) {
throw DruidException.defensive("state must be non-null");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw DruidException.defensive("state must be non-null");
throw DruidException.defensive("SegmentChangeStatus must have a non-null state");

@@ -567,6 +568,48 @@ public void testProcessBatchDuplicateLoadRequestsWhenFirstRequestFailsSecondRequ
segmentLoadDropHandler.stop();
}

@Test(timeout = 60_000L)
public void testProcessBatchWithDefaultNoStorageLocations() throws Exception
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public void testProcessBatchWithDefaultNoStorageLocations() throws Exception
public void testProcessBatchWithEmptyStorageLocations() throws Exception

List<DataSegmentChangeResponse> result = future.get();
Assert.assertEquals(1, result.size());

SegmentChangeStatus expectedFailedStatus = SegmentChangeStatus.failed(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't make sense to fail every segment load request. The historical should just fail to start up in the first place.

One way to do that would be to add a constructor to SegmentLoaderConfig and do the validation there. Another method could be to add a validateLocations() method to SegmentLoaderConfig and invoke it from a (probably guice-injected) class which is initialized on startup.

@abhishekrb19
Copy link
Contributor Author

Thanks for the review, @kfaraz. After digging up a bit more, I see there are some more changes needed to make it work for tasks, where the task locations aren't used/injected correctly in SegmentLoadDropHandler. Also, yes, the docs for this configuration are stale: https://druid.apache.org/docs/latest/configuration/#storing-segments.

Just to double check, did you actually run into this issue or did you discover it while reading through the code?

I ran into IndexOutOfBoundsException by accident when I didn't set the storage locations on the brokers, which should be optional.

Marking this as draft until the other things are fleshed out and then I will rethink this approach to enforce the storage locations validation for historicals.

@abhishekrb19 abhishekrb19 marked this pull request as draft May 15, 2024 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants