Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error hint for getproperty(::Dict, ::Symbol) #54504

Merged
merged 19 commits into from
Jun 12, 2024

Conversation

arhik
Copy link
Sponsor Contributor

@arhik arhik commented May 17, 2024

error hint for getproperty(::Dict, ::Symbol)

addresses #53618

@arhik arhik changed the title Update errorshow.jl error hint for getproperty(::Dict, ::Symbol) May 17, 2024
@arhik
Copy link
Sponsor Contributor Author

arhik commented May 17, 2024

@oscardssmith This is the least invasive commit with computational overhead on getproperty that I could come up with. I see a 10ns increase in getproperty calls. On mac M1 its jumping from 13ns to 24ns on average. Since Other exceptions are not carrying the information we need, I introduced an Exception type. Not sure if it's worth it. If it is a standard requirement we can make a separate commit on the MemberAccessError exception. (Should come up with better name may be). I am open to discussion on this PR.

@KristofferC
Copy link
Sponsor Member

If this is to be done it has to be done without any overhead at all for the working case. Isn't it possible to add an error hint for the already thrown exception or is it missing something?

@arhik
Copy link
Sponsor Contributor Author

arhik commented May 17, 2024

@KristofferC I tried to fit other Exceptions available. I will take a look again. Even if there is an existing exception that fits the purpose, we will have to filter symbols that are actually Dict field members. So the problem should still exist. I am starting to think if this is necessary feature.

EDIT:
Sorry I think I misunderstood you. Already thrown exception should go through similar strategy I think.

Actually there is nuance to it. I will explain the overhead part clearly in the first message. Allow me some time to do that. Simply put user end code like keys(dict) won't see any overhead. but dict.jl using dict.keys anywhere will see an overhead.

base/dict.jl Outdated Show resolved Hide resolved
Copy link
Member

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating this hint! We should definitely add it to the language, though it will take a bit of work to make it fit in nicely with existing systems.

base/boot.jl Outdated Show resolved Hide resolved
base/errorshow.jl Outdated Show resolved Hide resolved
base/errorshow.jl Outdated Show resolved Hide resolved
base/dict.jl Outdated
Comment on lines 119 to 125
function getproperty(d::Dict, x::Symbol)
if x in propertynames(d)
return getfield(d, x)
else
throw(MemberAccessError(typeof(d), x))
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to add an UndefFieldError or MemberAccessError, instead of special casing Dict, it makes more sense to update the error thrown by getfield itself. This would require digging into the c code:

diff --git a/base/boot.jl b/base/boot.jl
index 4a59bf6279..1569f6277e 100644
--- a/base/boot.jl
+++ b/base/boot.jl
@@ -205,8 +205,8 @@ export
     ErrorException, BoundsError, DivideError, DomainError, Exception,
     InterruptException, InexactError, OutOfMemoryError, ReadOnlyMemoryError,
     OverflowError, StackOverflowError, SegmentationFault, UndefRefError, UndefVarError,
-    TypeError, ArgumentError, MethodError, AssertionError, LoadError, InitError,
-    UndefKeywordError, ConcurrencyViolationError,
+    UndefFieldError, TypeError, ArgumentError, MethodError, AssertionError, LoadError,
+    InitError, UndefKeywordError, ConcurrencyViolationError,
     # AST representation
     Expr, QuoteNode, LineNumberNode, GlobalRef,
     # object model functions
@@ -388,6 +388,11 @@ struct UndefKeywordError <: Exception
     var::Symbol
 end
 
+struct UndefFieldError <: Exception
+    type::DataType
+    field::Symbol
+end
+
 const typemax_UInt = Intrinsics.sext_int(UInt, 0xFF)
 const typemax_Int = Core.Intrinsics.udiv_int(Core.Intrinsics.sext_int(Int, 0xFF), 2)
 
diff --git a/src/datatype.c b/src/datatype.c
index abbec420bb..1a4694a594 100644
--- a/src/datatype.c
+++ b/src/datatype.c
@@ -1736,7 +1736,7 @@ JL_DLLEXPORT int jl_field_index(jl_datatype_t *t, jl_sym_t *fld, int err)
         }
     }
     if (err)
-        jl_has_no_field_error(t->name->name, fld);
+        jl_has_no_field_error(t, fld);
     return -1;
 }
 
diff --git a/src/jl_exported_data.inc b/src/jl_exported_data.inc
index 79ff437841..90920df78a 100644
--- a/src/jl_exported_data.inc
+++ b/src/jl_exported_data.inc
@@ -138,6 +138,7 @@
     XX(jl_uint8_type) \
     XX(jl_undefref_exception) \
     XX(jl_undefvarerror_type) \
+    XX(jl_undeffielderror_type) \
     XX(jl_unionall_type) \
     XX(jl_uniontype_type) \
     XX(jl_upsilonnode_type) \
diff --git a/src/jltypes.c b/src/jltypes.c
index 420b88aeb7..89607a97de 100644
--- a/src/jltypes.c
+++ b/src/jltypes.c
@@ -3700,6 +3700,7 @@ void post_boot_hooks(void)
     jl_diverror_exception  = jl_new_struct_uninit((jl_datatype_t*)core("DivideError"));
     jl_undefref_exception  = jl_new_struct_uninit((jl_datatype_t*)core("UndefRefError"));
     jl_undefvarerror_type  = (jl_datatype_t*)core("UndefVarError");
+    jl_undeffielderror_type  = (jl_datatype_t*)core("UndefFieldError");
     jl_atomicerror_type    = (jl_datatype_t*)core("ConcurrencyViolationError");
     jl_interrupt_exception = jl_new_struct_uninit((jl_datatype_t*)core("InterruptException"));
     jl_boundserror_type    = (jl_datatype_t*)core("BoundsError");
diff --git a/src/julia.h b/src/julia.h
index 0d46f15776..2501040797 100644
--- a/src/julia.h
+++ b/src/julia.h
@@ -868,6 +868,7 @@ extern JL_DLLIMPORT jl_datatype_t *jl_initerror_type JL_GLOBALLY_ROOTED;
 extern JL_DLLIMPORT jl_datatype_t *jl_typeerror_type JL_GLOBALLY_ROOTED;
 extern JL_DLLIMPORT jl_datatype_t *jl_methoderror_type JL_GLOBALLY_ROOTED;
 extern JL_DLLIMPORT jl_datatype_t *jl_undefvarerror_type JL_GLOBALLY_ROOTED;
+extern JL_DLLIMPORT jl_datatype_t *jl_undeffielderror_type JL_GLOBALLY_ROOTED;
 extern JL_DLLIMPORT jl_datatype_t *jl_atomicerror_type JL_GLOBALLY_ROOTED;
 extern JL_DLLIMPORT jl_datatype_t *jl_missingcodeerror_type JL_GLOBALLY_ROOTED;
 extern JL_DLLIMPORT jl_datatype_t *jl_lineinfonode_type JL_GLOBALLY_ROOTED;
@@ -2023,7 +2024,7 @@ JL_DLLEXPORT void JL_NORETURN jl_type_error_rt(const char *fname,
                                                jl_value_t *ty JL_MAYBE_UNROOTED,
                                                jl_value_t *got JL_MAYBE_UNROOTED);
 JL_DLLEXPORT void JL_NORETURN jl_undefined_var_error(jl_sym_t *var, jl_value_t *scope JL_MAYBE_UNROOTED);
-JL_DLLEXPORT void JL_NORETURN jl_has_no_field_error(jl_sym_t *type_name, jl_sym_t *var);
+JL_DLLEXPORT void JL_NORETURN jl_has_no_field_error(jl_datatype_t *t, jl_sym_t *var);
 JL_DLLEXPORT void JL_NORETURN jl_atomic_error(char *str);
 JL_DLLEXPORT void JL_NORETURN jl_bounds_error(jl_value_t *v JL_MAYBE_UNROOTED,
                                               jl_value_t *t JL_MAYBE_UNROOTED);
diff --git a/src/rtutils.c b/src/rtutils.c
index f18a1ac112..29fa145880 100644
--- a/src/rtutils.c
+++ b/src/rtutils.c
@@ -152,9 +152,10 @@ JL_DLLEXPORT void JL_NORETURN jl_undefined_var_error(jl_sym_t *var, jl_value_t *
     jl_throw(jl_new_struct(jl_undefvarerror_type, var, scope));
 }
 
-JL_DLLEXPORT void JL_NORETURN jl_has_no_field_error(jl_sym_t *type_name, jl_sym_t *var)
+JL_DLLEXPORT void JL_NORETURN jl_has_no_field_error(jl_datatype_t *t, jl_sym_t *var)
 {
-    jl_errorf("type %s has no field %s", jl_symbol_name(type_name), jl_symbol_name(var));
+    // jl_errorf("type %s has no field %s", jl_symbol_name(t->name->name), jl_symbol_name(var));
+    jl_throw(jl_new_struct(jl_undeffielderror_type, t, var));
 }
 
 JL_DLLEXPORT void JL_NORETURN jl_atomic_error(char *str) // == jl_exceptionf(jl_atomicerror_type, "%s", str)
diff --git a/src/staticdata.c b/src/staticdata.c
index f6461e0b42..eaa5d66754 100644
--- a/src/staticdata.c
+++ b/src/staticdata.c
@@ -235,6 +235,7 @@ jl_value_t **const*const get_tags(void) {
         INSERT_TAG(jl_loaderror_type);
         INSERT_TAG(jl_initerror_type);
         INSERT_TAG(jl_undefvarerror_type);
+        INSERT_TAG(jl_undeffielderror_type);
         INSERT_TAG(jl_stackovf_exception);
         INSERT_TAG(jl_diverror_exception);
         INSERT_TAG(jl_interrupt_exception);

(This approach also has no overhead)

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow. Can I borrow this and check ?

@JeffBezanson
Copy link
Sponsor Member

This needs a different name. We use "undef" to mean a null reference. Maybe just FieldError or PropertyError?

@arhik
Copy link
Sponsor Contributor Author

arhik commented May 22, 2024

I like FieldError. Going with it.

src/jltypes.c Outdated
Comment on lines 3727 to 3731
jl_undefvarerror_type = (jl_datatype_t*)core("UndefVarError");
jl_fielderror_type = (jl_datatype_t*)core("FieldError");
jl_atomicerror_type = (jl_datatype_t*)core("ConcurrencyViolationError");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
jl_undefvarerror_type = (jl_datatype_t*)core("UndefVarError");
jl_fielderror_type = (jl_datatype_t*)core("FieldError");
jl_atomicerror_type = (jl_datatype_t*)core("ConcurrencyViolationError");
jl_undefvarerror_type = (jl_datatype_t*)core("UndefVarError");
jl_fielderror_type = (jl_datatype_t*)core("FieldError");
jl_atomicerror_type = (jl_datatype_t*)core("ConcurrencyViolationError");

Consistent style

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Will batch it.

@LilithHafner LilithHafner added the needs tests Unit tests are required for this change label May 22, 2024
@LilithHafner
Copy link
Member

This looks pretty good! It should have tests to make sure the new error is being thrown and the hint looks right.

@nospecialize
field = exc.field
type = exc.type
if type == :Dict
Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting a type Dict but I was a bit surprised to see I actually get Symbol. I should familiarize with internals more. But if anyone sees this as anomaly let me know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's seriously broken. I think the struct construction on the C side is buggy.

@arhik
Copy link
Sponsor Contributor Author

arhik commented May 24, 2024

This looks pretty good! It should have tests to make sure the new error is being thrown and the hint looks right.

Added tests. Let me know if they are reasonable.

@arhik
Copy link
Sponsor Contributor Author

arhik commented May 24, 2024

@LilithHafner I raised another PR JuliaSparse/SparseArrays.jl#539 related to this. How do we resolve this dependency. We need SparseArrays PR to be accepted for julia tests to pass. ref (https://buildkite.com/julialang/julia-master/builds/36697#018fa506-1c87-42bf-b902-09ef3cb581ec/826-1173)

LilithHafner pushed a commit to JuliaSparse/SparseArrays.jl that referenced this pull request May 25, 2024
@LilithHafner
Copy link
Member

This PR now includes the changes you made to SparseArrays tests

@LilithHafner LilithHafner removed the needs tests Unit tests are required for this change label May 26, 2024
Copy link
Member

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the most part, this LGTM, but the fact that the exception's type field is a Symbol when it is type checked to be a Type is very broken.

It would be nice to add a failing test for that, and we definitely can't merge before fixing it.

@arhik
Copy link
Sponsor Contributor Author

arhik commented May 26, 2024

Understood the concern. I didn't get chance to look into it. I will try make time soon.

@arhik arhik force-pushed the dictErrorHint branch 2 times, most recently from 586d3f0 to 25e9ccb Compare May 27, 2024 13:04
@arhik
Copy link
Sponsor Contributor Author

arhik commented May 27, 2024

@LilithHafner The type information is lost even before it comes to jl_new_struct. On inspection found issue in codegen. We didn't make corresponding changes in llvm related emit function signatures.

@gbaraldi I made some codegen changes for FieldError. Its trivial but could have unintended consequences. Your view would help.

https://github.com/JuliaLang/julia/pull/54504/files#diff-62cfb2606c6a323a7f26a3eddfa0bf2b819fa33e094561fee09daeb328e3a1e7R4082

@LilithHafner
Copy link
Member

I'm a little out of my depth here, I would appreciate @gbaraldi's feedback or for @gbaraldi to reccomend an alternative reviewer.

@LilithHafner LilithHafner dismissed their stale review May 27, 2024 14:43

out of date

@arhik arhik force-pushed the dictErrorHint branch 2 times, most recently from ad114f1 to b2676de Compare June 8, 2024 14:48
@arhik
Copy link
Sponsor Contributor Author

arhik commented Jun 8, 2024

@LilithHafner I rebased code. Can you check and validate whenever you are free ?

arhik and others added 7 commits June 10, 2024 00:19
We will have to register handlers for tests separately again.
Co-authored-by: Lilith Orion Hafner <lilithhafner@gmail.com>
minor tweaks from suggestions

Co-authored-by: Lilith Orion Hafner <lilithhafner@gmail.com>
Copy link
Member

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Thank you for tacking this issue and following through when it turned out to be more complex than it initially seemed it would be.

@LilithHafner LilithHafner added the status:merge me PR is reviewed. Merge when all tests are passing label Jun 9, 2024
test/errorshow.jl Outdated Show resolved Hide resolved
Consider using indexing syntax.

# Example
julia> dict = Dict(:$(field)=>5)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems odd to use markdown syntax here since this will just get printed to stderr.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the error hints tend to be quite short and to the point. Something like:

Dictionaries are indexed using the [key] syntax

or something along those lines should be enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For most markdown syntax (e.g. code formatting), I agree with you (see above conversation in this PR). However, the single # header is also reasonable plaintext "syntax".

This is a very niche error case (it only happens when you call my_dict.foo) for some my_dict::Dict, so it is unlikely that this will come across as spammy. The only case I can think of where this extended explanation would be unwelcome is if folks are intentionally attempting to access internal Dict fields and also making typos.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to keep the styling consistent with the rest of the hints for things to be cohesive. Following the example of * on AbstractString here would be the simplest.

@KristofferC KristofferC removed the status:merge me PR is reviewed. Merge when all tests are passing label Jun 10, 2024
@nospecialize
field = exc.field
type = exc.type
if type <: Dict
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be AbstractDict perhaps?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. I could implement an AbstractDict that forwards getproperty to getindex.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be said "typically" then.

@tecosaur
Copy link
Contributor

Just to come in here at the 11th hour with a bit of greediness, could

struct FieldError <: Exception
    type::DataType
    field::Symbol
end

be

struct FieldError <: Exception
    val::Any
    field::Symbol
end

I'm thinking this could be useful when the properties of an object depend on the value, not just the type (e.g. DataFrame).

That said, I'm not sure if it's possible to do this without overhead? If so it seems nice to have though.

@arhik
Copy link
Sponsor Contributor Author

arhik commented Jun 11, 2024

Just to come in here at the 11th hour with a bit of greediness, could

struct FieldError <: Exception
    type::DataType
    field::Symbol
end

be

struct FieldError <: Exception
    val::Any
    field::Symbol
end

I'm thinking this could be useful when the properties of an object depend on the value, not just the type (e.g. DataFrame).

That said, I'm not sure if it's possible to do this without overhead? If so it seems nice to have though.

I imagined a separate PropertyError Exception for that use case. I will let others speak on it. In my opinion having a separate Exception like PropertyError similar to FieldError will avoid confusion and also complexity in internal code. This will also be in line with existing convention like fieldnames and propertynames. As I said I will let other speak since I dont have much exposure to design decisions and internals.

You must be familiar with it FYI an implementation similar to first commit in this PR could possibly do the job for the use case you are alluding to. Ex: getproperty line here can be overwritten with custom function and error exception continuation.

@arhik
Copy link
Sponsor Contributor Author

arhik commented Jun 11, 2024

Quickly tried this and pasting this. Let me know if this is what you are implying. (overhead version obviously)

struct PropertyError <: Exception
	val::Any
	x::Symbol
end

using DataFrames

Base.getproperty(d::DataFrame, x::Symbol) = begin
	if x in propertynames(d)
		return d[!, x]
	else
		throw(PropertyError(d, x))
	end
end

function Base.showerror(io::IO, exc::PropertyError)
	@nospecialize
	println(io, "PropertyError: could not access property $(exc.x) on this $(typeof(exc.val)) type\n")
	Base.Experimental.show_error_hints(io, exc)
end

function propertyAccessHandler(io, exc)
	@nospecialize
	x = exc.x
	val = exc.val
	if val isa DataFrame
		println(io, "Looks like there is no column `$x` defined on this DataFrame obj.")
		print(io, "Only following columns are defined on this object: ")
		print(io, "$(propertynames(val))")
	else
		println(io, 
		"""
		A simple else message here
		"""
		)
	end
end


Base.Experimental.register_error_hint(propertyAccessHandler, PropertyError)

# Test

h = DataFrame(ENV)

h.c

image

An implementation without overhead like this PR should do the job.

@oscardssmith
Copy link
Member

I don't think the overhead is a problem here since it's only overhead in throwing a user-facing error.

@oscardssmith
Copy link
Member

@arhik I agree that this would probably be an improvement, but I am tempted to just merge this now and make the change to Any in a followup PR. (how can you not merge when CI is fully green? It's a sign...)

base/errorshow.jl Outdated Show resolved Hide resolved
Co-authored-by: Lilith Orion Hafner <lilithhafner@gmail.com>
@LilithHafner LilithHafner added status:merge me PR is reviewed. Merge when all tests are passing domain:error messages Better, more actionable error messages labels Jun 11, 2024
@IanButterworth IanButterworth merged commit 7e5d9a3 into JuliaLang:master Jun 12, 2024
9 checks passed
@oscardssmith oscardssmith added domain:collections Data structures holding multiple items, e.g. sets and removed status:merge me PR is reviewed. Merge when all tests are passing labels Jun 12, 2024
@arhik arhik deleted the dictErrorHint branch June 12, 2024 03:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:collections Data structures holding multiple items, e.g. sets domain:error messages Better, more actionable error messages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants