Skip to content

Conversation

@0UserName
Copy link

@0UserName 0UserName commented Jan 28, 2025

Continuing the discussion: #5415.

For a number of needs there are no public methods, so calls are implemented through UnsafeAccessor (see the Accessors directory).

I changed the access modifiers for WriteAsObject and GetSizeAsObject methods in PgConverter, and added GetCompositeType to PgSerializerOptions.

This will avoid using UsafeAccessor in the plugin.

@0UserName 0UserName force-pushed the expose_api_for_datatable_plugin branch from eeddc9b to 5cca138 Compare January 28, 2025 13:59
@0UserName 0UserName marked this pull request as ready for review January 30, 2025 16:23
@roji
Copy link
Member

roji commented Jan 30, 2025

/cc @NinoFloris

@0UserName
Copy link
Author

0UserName commented Feb 4, 2025

And one more potential addition:

private void AdaptDataTypeName()
{
    if (Value is DataTable dt && (!string.IsNullOrWhiteSpace(dt.TableName) || !string.IsNullOrWhiteSpace(_dataTypeName)))
    {
        _dataTypeName = Internal.Postgres.DataTypeName.FromDisplayName(dt.TableName ?? _dataTypeName!).ToArrayName();
    }
}

Call of this method should be placed at the beginning of ResolveTypeInfo.


This will allow:

  • No need to explicitly specify DataTypeName for NpgsqlParameter

  • Use Dapper, since the need to explicitly specify the type makes its use less attractive

  • No changes are required when migrating from SQL Server, because there is no need to use PostgreSQL-specific array notation

What do you think?

@NinoFloris
Copy link
Member

NinoFloris commented Feb 4, 2025

I changed the access modifiers for WriteAsObject and GetSizeAsObject methods in PgConverter, and added GetCompositeType to PgSerializerOptions.

These methods are explicitly internal. They are not part of the publicly accessible api surface as direct calls into these methods will miss important edge cases. You can of course use reflection to access anything, but as always when you resort to that you should try to understand why things aren't already setup the way you believe it should be.

GetSizeAsObject really is a very good example of this. It's already publicly accessible via the GetSizeOrDbNullAsObject extension method that takes into account many of the peculiarities that apply once values are boxed, and as such I wouldn't want to make the more error prone internal method public, that just leads to a pit of failure.

WriteAsObject on the other hand could be something we can consider making public, but certainly not directly. The internal signature containing bool async is unidiomatic and purely there for code size optimization.

These methods on PgConverter would be better candidates to make public:

internal void WriteAsObject(PgWriter writer, object value);
internal ValueTask WriteAsObjectAsync(PgWriter writer, object value, CancellationToken cancellationToken = default);

Let me think a bit whether this could cause any problems down the road, as we'd be somewhat committing to keeping this working in this exact way.

and added GetCompositeType to PgSerializerOptions.

Yes, I believe this has come up before as well. Internally we have access to the entire DatabaseInfo through the PgSerializerOptions but so far I've refrained from making it public to deter plugin writers from building resolvers that are dependent on specific database information, as it's often the incorrect approach. Most resolvers should refer to their supported set of pg types statically. For plugins enhancing functionality matching some predicate on pg types - where you're targetting all composite types, or any type containing a timestamp type, etc. - you absolutely need access to it as the DataTypeName alone won't be enough.

@roji what do you think? Should we just make the DatabaseInfo getter public, or should we move this to some static class akin to CollectionMarshal/MemoryMarshal in the BCL? I'm not sure it's worth it but it does signal to callers they might need to think about their approach a bit more.

And one more potential addition:

Regarding your last suggestion about AdaptDataTypeName, that would not be something we'd add to NpgsqlParameter just to enable an external plugin. We do have a mechanism to handle value dependent writes: PgConverterResolver. Npgsql uses this to pick the postgres type for a DateTime value based on its Kind. You want to use that mechanism to do something similar for DataTables, using the TableName to look up the right converter. That should also remove the need to register DataTable derived types as you can just use a normal DataTable for everything.

I also noticed in https://github.com/0UserName/npgsql.tvp/blob/master/Npgsql.Tvp/Internal/Resolvers/DataTableTypeInfoResolverFactory.cs you added your DataTable mapping logic to the resolver returned by CreateResolver. Given all mappings only apply to pg array types you might want to move things to be under CreateArrayResolver. Then CreateResolver would just return an empty resolver. That makes sure your resolver ends up in the right location in the chain and properly gets excluded by NpgsqlSlimDataSourceBuilder until EnableArrays is called.

What you're trying enable is fairly complicated and hits a lot of the nuances that make the entire type mapping business such an intricate topic, so don't let my feedback deter you. I fully believe that with some careful work it should all be possible within the current architecture, and I agree you need access to a few methods that are currently internal. We just have to make sure anything we add makes sense in the larger picture as well.

My suggestion would be to leave the PR for now, focus on getting your plugin into a complete state while trying to use as few internals as you can get away with. Once you're happy (or really running into concrete missing features) we can revisit and see what we should open up. Sounds good?

@0UserName
Copy link
Author

GetSizeAsObject really is a very good example of this. It's already publicly accessible via the GetSizeOrDbNullAsObject extension method that takes into account many of the peculiarities that apply once values are boxed, and as such I wouldn't want to make the more error prone internal method public, that just leads to a pit of failure.

But PgConverterExtensions is internal.


WriteAsObject on the other hand could be something we can consider making public, but certainly not directly. The internal signature containing bool async is unidiomatic and purely there for code size optimization.

These methods on PgConverter would be better candidates to make public:

internal void WriteAsObject(PgWriter writer, object value);
internal ValueTask WriteAsObjectAsync(PgWriter writer, object value, CancellationToken cancellationToken = default);

Make sense.. So, I will revert modifiers and make methods you mentioned as public, as well as PgConverterExtensions. Is PgConverterExtensions a good candidate for this?


Most resolvers should refer to their supported set of pg types statically. For plugins enhancing functionality matching some predicate on pg types - where you're targetting all composite types, or any type containing a timestamp type, etc. - you absolutely need access to it as the DataTypeName alone won't be enough.

There can be dozens or even hundreds of composite types/table types. For this reason, the implementation of resolving is dynamic in nature.


Regarding your last suggestion about AdaptDataTypeName, that would not be something we'd add to NpgsqlParameter just to enable an external plugin.

Well, it's not really about the plugin. If DataTable support is planned, then something like this would be needed one way or another, since using array notation is not very convenient.

In general, I was interested in the idea of ​​implicit conversion of the type name.

@0UserName
Copy link
Author

0UserName commented Feb 4, 2025

We do have a mechanism to handle value dependent writes: PgConverterResolver. Npgsql uses this to pick the postgres type for a DateTime value based on its Kind. You want to use that mechanism to do something similar for DataTables, using the TableName to look up the right converter. That should also remove the need to register DataTable derived types as you can just use a normal DataTable for everything.

I'm not sure if I understand correctly. DataTypeName that will be used by the command when sending data to the server is specified in advance. If you do not specify it from the client code, then the type mapping in the resolver will not work (unless you list all composite/table types in advance).

If you specify the type name, but do not use array notation, then an error will occur when sending data to the server, because the name is passed as a regular type, while the server expects an array.

That is, firstly, I tried to replace one DataTypeName with another, and secondly, to avoid explicitly setting. Is it possible to implement this using built-in features?

@0UserName
Copy link
Author

0UserName commented Feb 4, 2025

My suggestion would be to leave the PR for now, focus on getting your plugin into a complete state while trying to use as few internals as you can get away with. Once you're happy (or really running into concrete missing features) we can revisit and see what we should open up. Sounds good?

I'm afraid I just have no choice :) At the very least, I would like to have comments on the clarifications and a more or less acceptable PR (at least taking into account the above-mentioned comments)

And yes, thank you for the comments.

@NinoFloris
Copy link
Member

NinoFloris commented Feb 4, 2025

But PgConverterExtensions is internal.

Ah I overlooked that, it may make sense to make those extensions public.

There can be dozens or even hundreds of composite types/table types. For this reason, the implementation of resolving is dynamic in nature.

If you carefully read my reply you'd read I agree there are resolvers where you do need access to the runtime set of pg types. However, in general, resolvers provide a static set of mappings from CLR to postgres type and vice versa. A resolver describes all the mappings it is capable of, not just which types some specific database has available. You can take a look at our builtin resolvers, almost all of them are just a long list of concrete pg type names mapped to concrete CLR types. For plugins enabling something like a new postgres plugin (pgvector, ltree, postgis) the same applies, and you'll see that those resolvers just contain mappings for CLR type(s) to e.g. "pgvector". That's really the primary use for having this infrastructure pubternal at all.

Again, I agree there is a place for predicate based resolvers (and we also have some inside Npgsql, to handle any pg enum type for instance) but that was not the original target audience.

Well, it's not really about the plugin. If DataTable support is planned, then something like this would be needed one way or another, since using array notation is not very convenient.

There is no DataTable support planned as far as I'm aware? Any support would live entirely in your plugin. Even if we would add some DataTable support in Npgsql we'd do so through the PgConverterResolver mechanism. That mechanism really has nothing to do with requiring array notation or not.

PgConverterResolver exists to defer postgres type resolution until a user value is presented. Specifically when no other information is given (so no NpgsqlDbType or DataTypeName). An IPgTypeInfoResolver for DataTable would see Type == DataTable, DataTypeName == null and return a PgResolverTypeInfo with an absent/undecided PgTypeId. This is then internally handled during parameter binding to feed the value of the parameter into the PgResolverTypeInfo, which calls your PgConverterResolver implementation. At this point your implementation is able to inspect the DataTable and decide which PgTypeId it should be mapped to. How you do that is entirely up to you, but likely you just have a map from instance.TableName to PgTypeId. That map can be populated statically, but in your case you probably want to pass DatabaseInfo down into the PgConverterResolver so it can map any composite 'TableName' to the right Oid.

@0UserName
Copy link
Author

PgConverterResolver exists to defer postgres type resolution until a user value is presented. Specifically when no other information is given (so no NpgsqlDbType or DataTypeName). An IPgTypeInfoResolver for DataTable would see Type == DataTable, DataTypeName == null and return a PgResolverTypeInfo with an absent/undecided PgTypeId. This is then internally handled during parameter binding to feed the value of the parameter into the PgResolverTypeInfo, which calls your PgConverterResolver implementation. At this point your implementation is able to inspect the DataTable and decide which PgTypeId it should be mapped to. How you do that is entirely up to you, but likely you just have a map from instance.TableName to PgTypeId. That map can be populated statically, but in your case you probably want to pass DatabaseInfo down into the PgConverterResolver so it can map any composite 'TableName' to the right Oid.

Thanks for detailed explanation. I solved this problem. Unfortunately, without access to PostgresCompositeType, a number of opportunities are lost:

  • We cannot make sure that DataTable contains all the necessary columns before accessing the DB

  • If DataTable contains more columns than PostgresCompositeType, then we cannot automatically trim extra columns

  • We cannot pick columns by name so that the order of writing is performed in the order of the fields

@NinoFloris
Copy link
Member

I solved this problem

I'm curious how you did that, can you explain what you did?

Unfortunately, without access to PostgresCompositeType, a number of opportunities are lost:

Yes, you still need access to DatabaseInfo to be able to get the postgres type information. I haven't said anything to the contrary.

@0UserName
Copy link
Author

0UserName commented Feb 4, 2025

I haven't said anything to the contrary.

yy. I just wrote why it's important. Maybe it's obvious, but still.


I'm curious how you did that, can you explain what you did?

I implemented custom PgConverterResolver that calls GetArrayTypeId using TableName: Commit

@0UserName 0UserName force-pushed the expose_api_for_datatable_plugin branch from 5cca138 to 5eb664a Compare February 4, 2025 23:36
@0UserName
Copy link
Author

So, I will revert modifiers and make methods you mentioned as public, as well as PgConverterExtensions.

PR updated.

@0UserName
Copy link
Author

Is there a chance that some of the changes will be accepted/merged?

@NinoFloris
Copy link
Member

NinoFloris commented Nov 18, 2025

I have started improving this area in #6316 and #6317. With that out of the way I'll post up a PR opening up new api surface. Those new apis should give you what you need to create the datatable to composite support you want, while dealing with the edge-cases correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants