Normally when people talk about refactoring, they don’t show real code. Instead, you get fake examples, often so contrived that they don’t even resemble real-world code. The reader is left with an idealized vision of what refactoring looks like, where every line of code is under test and every problem neatly fits a pattern from their favorite book.
Well screw that. The real world is messy and it’s about time we acknowledge that. With countless examples of real projects in production use on GitHub, there’s no excuse to invent fake code to demonstrate refactoring.
So that’s what we’re going to do here. In this repository you get to see a real working library be incrementally refactored to improve code quality. A new branch was created for each step along the way so you can see the progression of the code from its original state to something… well not perfect, but better.
Is this the only way to clean up the code? Of course not. So I welcome you to fork this or the original and attempt your own cleanup.
In fact, this is actually the second attempt to refactor this library. THe first time through was done as a training exercise and exploration to see what was possible. Eventually enough knowledge was gained to fix the broken tests, allowing the majority to be run for the first time.
In that process, it was discovered that there were places in the code are highly sensitive to the names of enumerations or the order properties appear in the class. As that would be too diffcult to unwind, we instead started over with the intent to make a production-grade refactoring.
While the work has to be redone, the knowledge wasn't lost. The original refactoring log was kept and we refer to it from time to time to guide this effort.
- Rename projects and folders.
- Change supported frameworks to: net472, netstandard2.0, netcoreapp3.1, net6.0
- Fix compiler warnings for package conflicts.
- Add .editorconfig file.
- Fix tests.
- Delete tests marked as ignored.
- General cleanup of project files.
Tests failed with .NET 4.6.2 with a platform not supported exception, so the minimum version was left at 4.7.2.
After adding a parameters.json
file, all tests are still failing with this error message.
OneTimeSetUp: System.IO.FileNotFoundException : Could not find file 'C:\WINDOWS\system32\parameters.json'.
The problem is in this line of code:
StreamReader reader = new StreamReader("parameters.json");
An application's current directory is not guaranteed, so tests should use absolute paths.
var path = Path.Combine(Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location), "parameters.json");
StreamReader reader = new StreamReader(path);
The connection string builder (in the tests, not SnowflakeDbConnectionStringBuilder
) also needed to be changed to ignore the warehouse and role parameters.
At this point all remaining tests are passing.
Consider this section of Snowflake.Data.Tests.csproj
<Target Name="CopyCustomContent" AfterTargets="AfterBuild">
<Copy SourceFiles="parameters.json" DestinationFolder="$(OutDir)" />
<Copy SourceFiles="App.config" DestinationFolder="$(OutDir)" />
</Target>
Does it work? Yes, mostly. Should you do it this way? No.
Set the "Copy to Output Directory" flag so that other developers will understand your intentions.
<ItemGroup>
<None Update="App.config">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</None>
<None Update="parameters.json">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</None>
</ItemGroup>
And while we're at it, remove this silliness.
<ItemGroup>
<Folder Include="Properties\" />
</ItemGroup>
It doesn't hurt anything, but it makes a non-existent folder appear in the solution explorer.
- General code formatting with CodeMaid and
.editorconfig
- C# 10 features enabled
- Implicit using statements turned on
- Change the namespaces to use the
Tortuga
prefix. - Remove unused namespaces
A lot of files have unused using
statements and other basic formatting issues. Having an accurate list of using
statements is helpful when trying to determine what a class does. So we're going to fix that.
The new Implicit Usings feature is going to be turned on at the same time. This means we don't need to see general purpose using statements that we don't care about such as using System;
and using System.Collections.Generic
.
The members in the classes cannot be sorted. This causes the TestSimpleLargeResultSet
test to fail. The reason has not yet been determined, but most likely has something to do with serialization.
Part of the readability issue in this project is that the statements are so long that they needed to be broken over multiple lines.
By using namespace Snowflake.Data.Core.Authenticator;
instead of,
namespace Snowflake.Data.Core.Authenticator;
{
we remove a level of indentation. Four spaces don’t sound like much, but it can be enough to get everything onto one line. Especially when combined with other techniques that we'll be discussing later.
At the same time, we'll do another round of cleaning for the files that have two sets of using statements.
using System.Data;
using System.Data.Common;
using System.Globalization;
using System.Text;
namespace Snowflake.Data.Tests
{
using NUnit.Framework;
using Snowflake.Data.Client;
using Snowflake.Data.Core;
As per C# conventions, each class, interface, enum, etc. should be in its own file. This reduces the chances of a merge conflict and makes it easier to find specific classes in the solution explorer.
This reorganization operation created approximately 70 new files.
There are a handful of compiler warnings to clear. The first is to remove static fields that are never used.
Then remove pointless catch blocks.
catch (Exception ex)
{
throw ex;
}
SnowflakeDbException
declares SqlState
, which shadows a property in the base class for some frameworks.
HttpRequestMessage.Properties is obsolete for some frameworks, so a pair of extension methods with compiler constants are used to resolve the conflict.
This field should go away. There's a perfectly suitable field in the base class to handle this.
It would be nice if we could remove this field, but we can't because no constructor on DBException accepts both an error code and an inner exception. So the work-around stays. Though we will rename the field to _errorCode
to match the property name.
This is a weird one. Instead of just reading the value of the enum directly, it makes an expensive reflection call to get the number from an attribute.
[SFErrorAttr(errorCode = 270001)]
INTERNAL_ERROR,
_errorCode = error.GetAttribute<SFErrorAttr>().errorCode;
This easy solution to this is:
readonly SFError _error;
_errorCode = error;
public override int ErrorCode => (int)_errorCode;
In order for that to work, the SFError
enum needs to be renumbered.
public enum SFError
{
INTERNAL_ERROR = 270001,
COLUMN_INDEX_OUT_OF_BOUND = 270002,
INVALID_DATA_CONVERSION = 270003,
And now SFErrorAttr
can be deleted.
Then to make the error codes meaningful, we add this property:
public SFError SFErrorCode => _errorCode;
Ideally we would shadow the base class's ErrorCode
method, but we can't override and shadow a method at the same time.
The field CONNECTION_FAILURE_SSTATE
should be a constant.
The library as a whole is not internationalized, which is the only reason to add a resource file. Thus we can remove ErrorMessages.resx
, replacing it with a simple switch block.
A low-level library such as a database driver should not mandate a specific version of a specific logging framework. What if the user of this library prefers something other than Log4Net? Or maybe they are using Log4Net, but an incompatible version of it.
In the first pass of refactoring, we removed Log4Net as the default logger. If someone wants logging, they can implement the SFLogger
interface with the logger of their choice.
Then we removed the proprietary SFLogger
interface entirely. Why ask people to implement it when .NET's offers a generic ILogger
interface? It would be really surprising to find logging framework that didn’t support ILogger
directly.
On the third pass of this round, we looked at the actual messages. For the most part they were not useful. With trivial messages such as "All good", they read more like temporary Console output used during development than something that would be useful to developers using the library.
Furthermore, it is quite unusual for a low-level library to have a logger built into it. Logging at this level offers very little information because, being so low-level, there isn't a significant stack trace.
So in the end we removed the majority of the logging. The one place where it looked valuable was the OktaAuthenticator
, which has a six-step login process. For that we wrapped it in try-catch blocks that informed the caller what step failed.
The way that works is fairly simple. A lastStep
variable is updated every few lines and used if there is an exception.
catch (Exception ex)
{
throw new SnowflakeDbException("Okta Authentication in " + lastStep, ex, SFError.INTERNAL_ERROR);
}
For some strange reason, Log4Net was also the root package of a dependency chain that included System.Web. Once removed, we also lost HttpUtility
.
This was replaced by WebUtility
and QueryHelpers
(from Microsoft.AspNetCore.WebUtilities
).
With roughly 60 classes, the Core
namespace is rather crowded and it's hard to see what features and subsystems are covered by it.
Sometimes the best way to organize code is by guessing.
- Pick a word. In our first attempt, we'll choose "Chunk".
- Move every class that has the word in its name into a new folder.
- Right-click on the folder and select "Sync Namespace with Folder Structure". This will cause the code that uses the moved classes to be modified with a new
using
statement. - Review each modified class to see if it should also be moved into the new folder.
- Repeat steps 3 and 4 until no further classes should be moved.
- Review the results for overall consistency. Can you describe in plain English what features the folder is responsible for?
Our second word is "Session", which is the SFSession
class and its dependencies.
Next up is "ResultSet". Upon reviewing this, it seems that "ResultSet" is the core concept and "Chunk" refers to its implementation details. So we'll rename the Chunks
folder to ResultSets
.
Then we go through the remaining classes to see if they are only used by classes in a particular folder. That's a hint suggesting they should be moved.
The enum SFDataType
belongs in the Client
folder, as that's where the primary publich API for the library lives and it's needed for SnowflakeDbParameter
.
The file Snowflake.cs
isn't even a class. It is just assembly-level attributes, so it should be moved to the root and renamed Assembly.cs
.
After sweeping a few more classes into ResultSets
, we can start looking at request processing. This namespace will deal with constructing URLs and making outbound requests. Since the naming isn't as consistent as we would need for the previous strategy, we'll instead choose a cornerstone class. Starting with BaseRestRequest
, we'll iteratively move classes into the new folder based on how they relate to this one.
Since we're calling this folder RequestProcessing
, it makes sense to rename the ResultSets
folder to ResponseProcessing
. Now we have two namespaces with clearly defined roles.
Looking at the usage of BaseRestRequest
, it is a base class for classes throughout the code. Being a more general-purpose class, we put it back and choose a different cornerstone. The RestRequester
could be a better candidate. It is only used by SFSession
, and that's a different concept.
While that seems to fit, RestRequester
only brings with it the IRestRequester
interface. So we look for another candidate, SFRestRequest
. This brings with it SFStatement
.
After sorting several more classes into the message, request, response, or session folder, we circle back to BaseRestRequest
and once again move it into RequestProcessing
. You may find yourself frequently changing your mind when trying to sort out a large namespace. Don't worry about it; it's just part of the process.
In the end we came up with these file counts.
- Core: 19
- Messages: 28
- RequestProcessing: 7
- ResponseProcessing: 12
- ResponseProcessing/Chunks: 17
- Sessions: 12
Where did ResponseProcessing/Chunks
come from? Well at 29 files, ResponseProcessing
started getting big again. And since all of the chunks code is only referenced by SFResultSet
, it can be easily pulled into its own subsystem.
We left the Messages
large mainly because we rarely look at it. There is no code, only simple DTOs, so charting the interactions between classes isn't as necessary.
If an eager developer decides to alphabetize this enum, it will change the number assignments. As we don't currently know if they are important, it's best to just lock those into place by explicitly numbering them.
public enum SFDataType
{
None = 0,
FIXED = 1,
[...]
While this isn't necessary for application code, in a library extra care is needed to avoid breaking changes.
This is a weird one. Instead of just reading the value of the enum directly, it makes a reflection call to get the number from an attribute. Not only is this an expensive call, developers may not even realize it is necessary. Most will assume the enum is the actual error code.
[SFErrorAttr(errorCode = 270001)]
INTERNAL_ERROR,
_errorCode = error.GetAttribute<SFErrorAttr>().errorCode;
This easy solution to this is:
readonly SFError _error;
_errorCode = error;
public override int ErrorCode => (int)_errorCode;
In order for that to work, the SFError
enum needs to be renumbered.
public enum SFError
{
INTERNAL_ERROR = 270001,
COLUMN_INDEX_OUT_OF_BOUND = 270002,
INVALID_DATA_CONVERSION = 270003,
And now SFErrorAttr
can be deleted.
Then to make the error codes meaningful, we add this property:
public SFError SFErrorCode => _errorCode;
Ideally, we would shadow the base class's ErrorCode
method, but we can't override and shadow a method at the same time.
The SFStatementType
enum is treated the same as SFError
, except it needs to be a long
.
This is another example of using attributes on enums.
[SFSessionPropertyAttr(required = true)]
PASSWORD,
[SFSessionPropertyAttr(required = false, defaultValue = "443")]
PORT,
At first glance this seems to be a more reasonable use, but when we dig deeper we find a very serious problem.
The code below treats attributes on the SFSessionProperty
enums as a hidden global variable. If after making a request using a proxy, you then make a request that doesn't use a proxy it will behave incorrectly.
// Based on which proxy settings have been provided, update the required settings list
if (useProxy)
{
// If useProxy is true, then proxyhost and proxy port are mandatory
SFSessionProperty.ProxyHost.GetAttribute<SFSessionPropertyAttribute>().Required = true;
SFSessionProperty.ProxyPort.GetAttribute<SFSessionPropertyAttribute>().Required = true;
// If a username is provided, then a password is required
if (properties.ContainsKey(SFSessionProperty.ProxyUser))
{
SFSessionProperty.ProxyPassword.GetAttribute<SFSessionPropertyAttribute>().Required = true;
}
}
This code will have to be removed and replaced with CheckSessionProperties(properties, useProxy);
. After which we can correct the mistakes inSFSessionPropertyAttr
.
This SFSessionPropertyAttr
is being used correctly, but there are a couple of minor implementation mistakes.
class SFSessionPropertyAttr : Attribute
{
public string defaultValue { get; set; }
public bool required { get; set; }
}
Specifically, the mistakes are:
- The name should have the suffix
Attribute
. The C# compiler looks for this suffix and allows you to omit it when applying an attribute to a construct. - It is missing the
AttributeUsage
attribute. This tells the compiler where an attribute can be used so that it can warn the programmer of mistakes. - The property names are cased incorrectly.
The fixed attribute can be seen below.
[AttributeUsage(AttributeTargets.Field)]
class SFSessionPropertyAttribute : Attribute
{
public string DefaultValue { get; init; }
public bool Required { get; init; }
}
The usage of the attribute also changes slightly.
Before:
[SFSessionPropertyAttr(required = false)]
SCHEMA,
[SFSessionPropertyAttr(required = false, defaultValue = "https")]
SCHEME,
[SFSessionPropertyAttr(required = true, defaultValue = "")]
USER,
After
[SFSessionProperty]
SCHEMA,
[SFSessionProperty(DefaultValue = "https")]
SCHEME,
[SFSessionProperty(Required = true, DefaultValue = "")]
USER,
To support versions prior to .NET 5, this code is needed. Without it, the init
keyword doesn't compile.
#if !NET5_0_OR_GREATER
namespace System.Runtime.CompilerServices;
internal static class IsExternalInit { }
#endif
For some reason the type of the connection
field is DbConnection
instead of SnowflakeDbConnection
. Which in turn opens the door for unsafe casts such as:
var session = (connection as SnowflakeDbConnection).SfSession;
What do all of these error conditions have in common?
// Unsetting connection not supported.
throw new SnowflakeDbException(SFError.UNSUPPORTED_FEATURE);
// Must be of type SnowflakeDbConnection.
throw new SnowflakeDbException(SFError.UNSUPPORTED_FEATURE);
// Connection already set.
throw new SnowflakeDbException(SFError.UNSUPPORTED_FEATURE);
They all return exactly the same error message. The actual error desciption is only available as a comment. We can fix that, and at the same time put in more appropriate exception types.
throw new InvalidOperationException("Unsetting the connection not supported.");
throw new ArgumentException("Connection must be of type SnowflakeDbConnection.", nameof(DbConnection));
throw new InvalidOperationException("Connection already set.");
Error messages aside, this series of if
statements can be hard to follow.
set
{
if (value == null)
{
if (connection == null)
{
return;
}
// Unsetting connection not supported.
throw new SnowflakeDbException(SFError.UNSUPPORTED_FEATURE);
}
if (!(value is SnowflakeDbConnection))
{
// Must be of type SnowflakeDbConnection.
throw new SnowflakeDbException(SFError.UNSUPPORTED_FEATURE);
}
var sfc = (SnowflakeDbConnection)value;
if (connection != null && connection != sfc)
{
// Connection already set.
throw new SnowflakeDbException(SFError.UNSUPPORTED_FEATURE);
}
connection = sfc;
sfStatement = new SFStatement(sfc.SfSession);
}
The first change is to move the guard statement up top. If the connection is already set, there is no reason to go through the rest of the branches.
The we can switch on the value
. Now it's easy to see the three possible values by glancing at the case
labels.
set
{
if (connection != null && connection != value)
throw new InvalidOperationException("Connection already set.");
switch (value)
{
case null:
if (connection == null)
return;
else
throw new InvalidOperationException("Unsetting the connection not supported.");
case SnowflakeDbConnection sfc:
connection = sfc;
sfStatement = new SFStatement(sfc.SfSession);
return;
default:
throw new ArgumentException("Connection must be of type SnowflakeDbConnection.", nameof(DbConnection));
}
}
A common beginner mistake is to discard the stack trace of an exception.
catch (Exception ex)
{
logger.Error("The command failed to execute.", ex);
throw ex;
}
This can be fixed by using throw;
instead of throw ex;
.
The static method convertToBindList
is only called in two places. In both cases, it's only argument is a field in the same class. Which means that it can be converted into a instance method that reads the field directly.
Instead of returning a null, the method convertToBindList
could return an empty dictionary. At the cost of a small allocation, we can eliminate the need for null checks down the line.
Speaking of which, nullable reference types are not enabled in this library. That will have to be addressed.
The type of exception being thrown is important to caller. It gives them hints about what went wrong so they know where to start their research. Some exceptions such as InvalidOperationException
say "you can't do that now" while NotSupportedException
says "you can never do that".
throw new Exception("Can't execute command when connection has never been opened");
This error message has the word "when" in it, suggesting that it should be a InvalidOperationException
.
For more information on this concept, see Designing with Exceptions in .NET SEP.
This method and its async version are ignoring then CommandBehavior
parameter. Fixing this will require a change to SnowflakeDbDataReader
as well.
This is the code from DBConnection
, as reported by Visual Studio.
public abstract string ConnectionString
{
get;
[param: AllowNull]
set;
}
For our first attempt, we just copy it into SnowflakeDbConnection
.
public override string ConnectionString { get; [param: AllowNull] set; }
But that gives this compiler error.
Error CS8765 Nullability of type of parameter 'value' doesn't match overridden member (possibly because of nullability attributes).
What it really wants is this.
[AllowNull]
public override string ConnectionString { get; set; }
But now we have a non-nullable property that can store and return a null. To fix that issue, we need to normalize incoming nulls to empty strings.
string _connectionString = "";
[AllowNull]
public override string ConnectionString {
get => _connectionString;
set => _connectionString = value ?? "";
}
Since the AllowNull attribute doesn’t exist in older frameworks, we need to add it to the project with a conditional compiler flag.
#if !NETCOREAPP3_1_OR_GREATER
namespace System.Diagnostics.CodeAnalysis
{
[AttributeUsage(AttributeTargets.Property | AttributeTargets.Field | AttributeTargets.Parameter, Inherited = false)]
sealed class AllowNullAttribute : Attribute
{
public AllowNullAttribute()
{
}
}
}
#endif
The fields in this project use a mixture of camelCase
, _camelCase
, and PascalCase
. The normal camelCase
is not acceptable because it conflicts with parameter names in constructors. And PascalCase
conflicts with property names when properties need to be manually implemented. That leaves _camelCase
as the natural choice for this project. And if this was a straight refactoring job, that's what we'd use.
However, the intention is to make this into a maintained Tortuga Research project. As such, we're going to go with with the m_PascalCase
naming convention to be consisent with other Tortuga Research projects.
Since the Prepare
method isn't implemented, we can block calls to it at compile time using Obsolete
and hide it from the IDE using EditorBrowsable
. And we'll change it to a NotSupportedException
to make it clear that it isn't a planned feature for the future.
[Obsolete($"The method {nameof(Prepare)} is not implemented.", true)]
[EditorBrowsable(EditorBrowsableState.Never)]
public override void Prepare() => throw new NotSupportedException();
Enabling Nullable Reference Types reveals a bug in the SetStatement
method. Specifically, it is missing a null check on the connection parameter.
In some of the methods, you will also see this pattern.
SetStatement();
return sfStatement.Execute[...]
The null checker doesn't understand this. It can't detect that SetStatement
ensures sfStatement
isn't null.
We can, however, change SetStatement
to return a non-null reference to sfStatement
. (We would have liked to remove the sfStatement
entirely, but it's needed for the Cancel
method.)
Another change we can make is removing the initializer for sfStatement
in the DbConnection
property. This isn't needed because it will just be replaced when the command is executed.
The CommandTimeout
is 0, so we don't need to explicitly set it.
The parameterCollection
field can be initialized in its declaration.
As with the SnowflakeDbCommandBuilder
, properties whose setters shouldn't be called can be blocked. And since they don't provide useful information, the property will be hidden as well.
[EditorBrowsable(EditorBrowsableState.Never)]
public override bool DesignTimeVisible
{
get => false;
[Obsolete($"The {nameof(DesignTimeVisible)} property is not supported.", true)]
set => throw new NotSupportedException($"The {nameof(DesignTimeVisible)} property is not supported.");
}
Where possible, fields are marked as readonly
instead of private
.
This doesn't help performance, but it does act as a form of documentation. Developers glancing at the list of fields can quickly spot which ones are mutable.
However, the intention is to make this into a maintained Tortuga Research project. As such, we're going to go with with the m_PascalCase
naming convention to be consisent with other Tortuga Research projects.
The properties DbCommandBuilder.QuotePrefix
and DbCommandBuilder.QuoteSuffix
have setters because some databases support more than one quoting style.
Since that's not the case for Snowflake, the properties should be changed to return a constant and the setter marked as not supported. To make this 100% clear to callers, we can further restrict it by marking it as obsolete.
[Obsolete($"The property {nameof(QuoteSuffix)} cannot be changed.", true)]
set => throw new NotSupportedException($"The property {nameof(QuotePrefix)} cannot be changed.");
In a few places, we see lines of code like this:
return string.Format(CultureInfo.InvariantCulture, "{0}", parameterOrdinal);
While it will return the correct value, the Format
function is slow and shouldn't be used if alternatives such as ToString
are available.
return parameterOrdinal.ToString(CultureInfo.InvariantCulture);
Here's another example,
return string.Format(CultureInfo.InvariantCulture, "{0}", parameterName);
We can remove the Format
call by transforming it into this:
return parameterName.ToString(CultureInfo.InvariantCulture);
And according to the documentation, that just returns the original string. Leaving us with:
return parameterName;
Enabling nullable reference types in this file is as easy as adding #nullable enable
and a couple of ?
characters.
In several places, taskCompletionSource.SetResult
is called with a null
. While not really problematic, this causes the null checker to complain. So a statically defined marker object can be passed to it instead.
static object s_markerObject = new();
taskCompletionSource.SetResult(s_markerObject);
The Database
property is defined by the expression
_connectionState == ConnectionState.Open ? SfSession?.database : string.Empty;`
This is safe, but the null checker doesn't known that. Thankfully, it can be simplified to just SfSession?.database ?? ""
.
The SfSession
property is set by a call to SetSession
, so again the null checker doesn't understand. Fortunately this only occurs in 2 places, so the !
modifer can quiet the compiler.
This class has a finalizer. Which means GC.SuppressFinalize(this)
must be added to the Dispose(bool)
method to avoid unnecessarily adding the object to the finalizer queue.
It should be noted that Microsoft no longer recommends implementing finalizers. So we will just eliminate it.
Consider this method.
private void OnSessionEstablished()
{
_connectionState = ConnectionState.Open;
}
Its sole purpose is to set the connection state to Open
. There's no reason to believe it will grow with time. Nor are there any complex expressions that need to avoid duplicating. So this method can be inlined.
Normally calls to Close
are simply forwarded to Dispose
. But in the case of DbConnection
, the script is flipped. The call to Dispose
is forwarded to Close
. And even after the object is closed/disposed, it can be reopened.
In order to properly support this, a change needs to be made.
protected override void Dispose(bool disposing)
{
//Remove this check, it prevents a re-opened connection from being disposed.
//if (_disposed)
// return;
try
{
this.Close();
}
catch (Exception ex)
{
// Prevent an exception from being thrown when disposing of this object
}
//we no longer read from this field
//_disposed = true;
//Not needed. ComponentBase.Dispose(bool) does nothing.
//base.Dispose(disposing);
}
The exception being swallowed is a common design pattern for .NET. It is needed for finally
blocks to work correctly.
- Removed unused fields.
- Removed unused private constructor.
- Removed unnecessary call to
GC.SuppressFinalize
. - Removed the explicit interface implementation.
- Fixed the strongly typed methods.
On that last point, this code is wrong.
new public SnowflakeDbCommand SelectCommand
{
get { return _selectCommand; }
set { _selectCommand = value; }
}
If the caller uses DbDataAdapter.SelectCommand
instead of SnowflakeDbDataAdapter.SelectCommand
, they will get the wrong value.
Instead, the code should have looked like this:
new public SnowflakeDbCommand? SelectCommand
{
get { return (SnowflakeDbCommand?)base.SelectCommand; }
set { base.SelectCommand = value; }
}
Note how it reuses the base class's property for storing the value. This isn't 100% type safe, but it's what we need to do in order to prevent hard to detect bugs.
As per modern conventions, we're going to use expression bodies for simple methods and properties. If the expression has something interesting that the developer needs to be aware of, we'll leave it as the older block style. For example, in GetByte
we want to call out that all but the first element in the byte array is being discarded. So, we leave that on its own line.
public override byte GetByte(int ordinal)
{
var bytes = m_ResultSet.GetValue<byte[]>(ordinal);
return bytes[0];
}
We are also using var
where possible to reduce unnecessary boilerplate. This will also make refactoring easier, as types won't need to be changed in multiple places.
The basic pattern for implementing GetEnumerator
for a DBDataReader
is this:
public override IEnumerator GetEnumerator()
{
return new Enumerator(this);
}
class Enumerator : IEnumerator
{
SnowflakeDbDataReader _parent;
public Enumerator(SnowflakeDbDataReader parent)
{
_parent = parent;
}
object IEnumerator.Current => _parent;
bool IEnumerator.MoveNext()
{
return _parent.Read();
}
void IEnumerator.Reset()
{
throw new NotSupportedException();
}
}
With yield return
, we can simplify it to:
public override IEnumerator GetEnumerator()
{
while (Read())
{
yield return this;
}
}
When this flag is set, closing the DbDataReader
must result in the associated connection being closed.
This is a minor change, but it is less error prone to use nameof(dataOffset)
instead of "dataOffset"
.
This method always returns false, so we'll mark it as EditorBrowsableState.Never
. But since calling it won't throw an exception, we don't need to take the extra step of marking it obsolete.
These are modified to return the invalid value as part of the exception.
throw new ArgumentOutOfRangeException(nameof(dataOffset), "Non negative number is required.");
throw new ArgumentOutOfRangeException(nameof(dataOffset), dataOffset, "Non negative number is required.");
In theory the parameters for the SnowflakeDbDataReader
will never be null. In practice mistakes happen, so null checks are added as a precaution.
internal SnowflakeDbDataReader(SFBaseResultSet resultSet, SnowflakeDbConnection connection, CommandBehavior commandBehavior)
{
m_ResultSet = resultSet ?? throw new ArgumentNullException(nameof(resultSet), $"{nameof(resultSet)} is null."); ;
m_Connection = connection ?? throw new ArgumentNullException(nameof(connection), $"{nameof(connection)} is null.");
m_CommandBehavior = commandBehavior;
m_SchemaTable = PopulateSchemaTable(resultSet);
RecordsAffected = resultSet.CalculateUpdateCount();
}
Remove the assignments for SFDataType
and OriginType
in the default constructor as they will be automatically initialized to that value.
By convention, parameters are in camelCase
.
Again, we'll mark the overridden property as obsolete.
Enabling null checking on this class catches potential null reference exceptions in SnowflakeDbCommand
.
This is an old collection design from a time when .NET didn't have generics. So a lot of runtime casts are needed.
This should be a read-only object field.
It should have never existed in the first place. But it does, so it should be honored.
The tryCastThrow
method isn't needed. The built-in type check when casting is sufficient.
A type-safe version of Add
should be created.
This code:
public override void AddRange(Array values)
{
IEnumerator e = values.GetEnumerator();
while (e.MoveNext())
{
parameterList.Add(tryCastThrow(e.Current));
}
}
Can be converted into a normal for-each loop:
public override void AddRange(Array values)
{
foreach(SnowflakeDbParameter value in values)
_parameterList.Add(value);
}
The base case is easy to implement with a simple cast.
public override void CopyTo(Array array, int index)
{
_parameterList.CopyTo((SnowflakeDbParameter[])array, index);
}
For the full version, a bit more is needed.
public override void CopyTo(Array array, int index)
{
if (array is SnowflakeDbParameter[] sTypedArray)
m_ParameterList.CopyTo(sTypedArray, index);
else if (array is DbParameter[] dTypedArray)
for (var i = 0; i < m_ParameterList.Count; i++)
dTypedArray[i + index] = m_ParameterList[i];
else if (array is IDataParameter[] iTypedArray)
for (var i = 0; i < m_ParameterList.Count; i++)
iTypedArray[i + index] = m_ParameterList[i];
else if (array is IDbDataParameter[] idTypedArray)
for (var i = 0; i < m_ParameterList.Count; i++)
idTypedArray[i + index] = m_ParameterList[i];
}
Arguably we didn't need to take it quite this far, but it will make it easier on the caller.
Though now we run into a code duplication issue. To resolve it, we can leverage array covariance and cast the target collection to object[]
.
public override void CopyTo(Array array, int index)
{
switch (array)
{
case SnowflakeDbParameter[] sTypedArray:
m_ParameterList.CopyTo(sTypedArray, index);
break;
case DbParameter[]:
case IDbDataParameter[]:
case IDataParameter[]:
var untypedArray = (object[])array;
for (var i = 0; i < m_ParameterList.Count; i++)
untypedArray[i + index] = m_ParameterList[i];
break;
default: throw new InvalidCastException($"{nameof(array)} is not a supported array type.");
}
}
But what if there is another valid array type that we're not aware of? We can just try it and allow the runtime to throw the InvalidCastException.
public override void CopyTo(Array array, int index)
{
if (array is SnowflakeDbParameter[] sTypedArray)
m_ParameterList.CopyTo(sTypedArray, index);
else
{
var untypedArray = (object[])array;
for (var i = 0; i < m_ParameterList.Count; i++)
untypedArray[i + index] = m_ParameterList[i];
}
}
At this point, we have to ask why have the special case at all?
public override void CopyTo(Array array, int index)
{
var untypedArray = (object[])array;
for (var i = 0; i < m_ParameterList.Count; i++)
untypedArray[i + index] = m_ParameterList[i];
}
This function:
public override int IndexOf(string parameterName)
{
int index = 0;
foreach (SnowflakeDbParameter parameter in _parameterList)
{
if (String.Compare(parameterName, parameter.ParameterName) == 0)
{
return index;
}
index++;
}
return -1;
}
should be just a normal for loop.
public override int IndexOf(string parameterName)
{
for (int i = 0; i < _parameterList.Count; i++)
if (_parameterList[i].ParameterName == parameterName)
return i;
return -1;
}
There is no need to mark _parameterList
internal. Anything that reads it can read the class itself.
For for
loops, this property is needed.
public new SnowflakeDbParameter this[int index]
{
get => _parameterList[index];
set => _parameterList[index] = value;
}
Again, finalizers are no longer recommended.
This should throw an ArgumentOutOfRangeException
in the constructor.
Just like in SnowflakeDbConnection
, this should swallow exceptions.
The code in this class is surprisingly hard to read. There's not much going on, but the lines are so long that the important parts get lost in the noise. Fortunately, there are a few tricks to deal with that.
if (!session.properties.TryGetValue(SFSessionProperty.PRIVATE_KEY_FILE, out var pkPath) &&
!session.properties.TryGetValue(SFSessionProperty.PRIVATE_KEY, out var pkContent))
Since session.properties
is used a lot, we can capture it in a local variable.
var properties = session.properties;
[...]
if (!properties.TryGetValue(SFSessionProperty.PRIVATE_KEY_FILE, out var pkPath) &&
!properties.TryGetValue(SFSessionProperty.PRIVATE_KEY, out var pkContent))
Next, we introduce a static using
declaration so we don't need to repeat the enum name.
using static Snowflake.Data.Core.SFSessionProperty;
[...]
if (!properties.TryGetValue(PRIVATE_KEY_FILE, out var pkPath) &&
!properties.TryGetValue(PRIVATE_KEY, out var pkContent))
The output parameters of the TryGet
methods aren't being used, so we can use discards.
if (!properties.TryGetValue(PRIVATE_KEY_FILE, out _) &&
!properties.TryGetValue(PRIVATE_KEY, out _))
Though in this case, if we really don't need the value then we can choose a different method.
if (!properties.ContainsKey(PRIVATE_KEY_FILE) &&
!properties.ContainsKey(PRIVATE_KEY))
Here is another trick to remove boiler plate.
throw new SnowflakeDbException(
SFError.INVALID_CONNECTION_STRING,
new object[] { invalidStringDetail });
The type object
is not necessary, as the compiler can infer if from the context.
throw new SnowflakeDbException(
SFError.INVALID_CONNECTION_STRING,
new[] { invalidStringDetail });
Technically it's inferring a string[]
array, because that's the type of object being put inside it. But arrays are 'covariant', which means you can give a string[]
array to a function that expects an object[]
array.
A strange thing about this call is that we didn't need to create the array at all. The parameter is marked with params
, which means the compiler will create the array for us.
throw new SnowflakeDbException(
SFError.INVALID_CONNECTION_STRING,
invalidStringDetail);
Part of the readability issue is that the statements were so long that they needed to be broken over multiple lines.
By using namespace Snowflake.Data.Core.Authenticator;
instead of,
namespace Snowflake.Data.Core.Authenticator;
{
we remove a level of indentation. Four spaces don’t sound like much, but it can be enough to get everything onto one line. Especially when combined with other techniques shown above.
Validation for each authenticator is performed in the AuthenticationFactory
rather than the authenticators themselves.
This is a problem because if those authenticators are created via any other means, the validation won't be performed.
It can be fixed by moving the validation into the constructor of each authenticator class.
In ideomatic .NET code, fields are almost always private. Normally a protected field should be converted into a property with proper access controls.
The ClientEnv
field is easy. We just need to change it from protected
to readonly
.
The session
field is slightly trickier. It is set by the base class constructor, and then again in the subclass constructor. Once this redundancy is removed, it can be made readonly
.
It still needs to be protected
because at least one subclass reads from it. So we mark it as a readonly property instead.
The last field has a few interesting characteristics.
- Its value is determined by a constant in each subclass (except
OktaAuthenticator
). - It should never be modified.
This sounds more like an abstract, read-only property than a field.
The SetSpecializedAuthenticatorData
method incorrectly marks its parameter as ref
. If a subclass were to replace the parameter data
insead of just modifying it, the code would fail.
protected abstract void SetSpecializedAuthenticatorData(ref LoginRequestData data);
Fortunatelly none of the subclasses does this, so the ref
flag can be removed.
In most subclasses, there is a static field named AUTH_NAME
. This should be a constant.
Every subclass of BaseAuthenticator
implement this interface, so it can be moved into that class.
Normally the interface methods would be exposed in the base class as abstract
methods. But the interface methods Authenticate
and AuthenticateAsync
always call Login
and LoginAsync
.
This means we can combine them. Simply rename Authenticate
and AuthenticateAsync
to be Login
and LoginAsync
. Then make the real Login
and LoginAsync
methods virtual
.
There are places where Task.Run
is used haphazardly to call asynchronous methods from a synchronous context. This is essentially caused by the design of HttpClient
, which originally didn't have any synchronous methods, and the design of System.Data
, which originally only had synchronous methods.
To address this, a set of extension methods were created. For down-level clients, these simulate the core synchronous methods from .NET Core 5. Then it stacks on synchronous methods for all of the asynchronous methods that doesn't exist yet.
While not a perfect solution, it will allow the library to slowly eliminate the 'sync over async' situations as HttpClient
is improved.
This interface is marked as internal
and thus can't be used for mocking. The only thing that implements it is BaseAuthenticator
. At this point there is no purpose to it and thus it can be deleted.
To match .NET naming conventions for base classes, BaseAuthenticator
simply becomes Authenticator
. The Base
prefix is unnecessary.
Another naming convention is that namespaces should be plural. So, the Core.Authenticator
namespace is renamed Authenticators
.
As a public class, SFConfiguration
is moved out of Core
and renamed to SnowflakeDbConfiguration
to be more consistent with other public classes.
Change the public fields into properties.
The static method Instance
should also be a property. To make it more descriptive, it will also be renamed to Default
.
The SFConfiguration
class doesn't make sense as a singleton. If we really want only one instance, then we could just replace it with a static class and save some code.
But what if the application needs two instances of the configuration? Perhaps some calls need to use the V2 Json parser and some need the original.
In order to facility this, the following changes will be made.
- The
Instance
property will be renamed toDefault
. - The
SnowflakeDbConfiguration
will be made into an immutable record. This is to avoid accidentally changing the default when you think you're changing a copy. - The
SnowflakeDbConfiguration.Default
property will add a setter.
This gives you two ways to change the default configuration:
//Modify a copy of existing record
SnowflakeDbConfiguration.Default = SnowflakeDbConfiguration.Default with { ChunkDownloaderVersion = 2 };
//Create a new object
SnowflakeDbConfiguration.Default = new(useV2JsonParser: true, useV2ChunkDownloader: false, chunkDownloaderVersion: 3)
Then to add flexability, the configuration will flow
- The
SnowflakeDbConnection
object will have a property calledConfiguration
. - The configuration will flow through the session and result set to the response processing components.
This isn't used anywhere so it can be deleted.
- Add
#nullable enable
to every each file in the namespace. Update properties where appropriate.
- Add
#nullable enable
to each file in the namespace. Update properties where appropriate. - Change the public fields in
EncryptionData
to properties - Change the public fields in
KeyWrappingMetadataInfo
to properties - Replace readonly strings with constants
- Standardize field names
- Mark fields as readonly where possible
- Replace aync calls such as
HttpClient.GetStreamAsync
with sync calls such asHttpClient.GetStream
. This allows us to remove theTask.Wait()
calls. - Mark methods as static where possible.
- You don't need to check if a directory already exists before creating it.
Directory.CreateDirectory
does both. - Add
using
statements to memory streams. (Technically not necessary, but will prevent compiler warnings later.) - The
SFRemoteStorageUtil.DownloadOneFile
method can throw a null exception. - Use
var
where appropriate - In
SFSnowflakeAzureClient
, m_BlobServiceClient can be null. But some methods assume it won't be.
The SFGCSClient.GetFileHeader
method reports errors by modifying an input parameter. This is fundementally wrong, but there isn't an obvious way to fix it. And it is causing problems with null checks.
Methods in SFRemoteStorageUtil
destroys the stack trace when there is an error.
- Add
#nullable enable
to each file in the namespace. Update properties where appropriate. - Use
var
where appropriate - Add
using
statements where appropriate. - Check parameters for null values
- Remove unused fields
- Lined up parameters in method signatures and calls
- Add
#nullable enable
to each file in the namespace. Update properties where appropriate. - Assume any null checks on messages that don't currently exist aren't needed. (Long term, appropriate null checks should be added.)
- Use string interpolation
- Add
#nullable enable
to each file in the namespace. Update properties where appropriate. - Mark static strings as const
- Lined up parameters in method signatures and calls
- Drop the
IRestRequest
interface. TheBaseRestRequest
serves the same purpose. - Rename
BaseRestRequest
to justRestRequest
- Make
RestRequest.ToRequestMessage
abstract. - Use
var
where appropriate - Create real sync methods in
RestRequester
. This uses theHttpContentSynchronously
we created in round 18. - Fix missing stack traces in
RestRequester
- Changed the proxy test to accept a connection timeout as a valid result.
- Updated other tests to not expect an aggregate exception
Now that we are making sync calls to HttpClient
, we need to extend the RetryHandler
to support them. This class modifies the behavior of a HttpClient
directly.
A bit of non-standard code is the backoff delay. We can't cancel a Thread.Sleep with a cancellationToken, but we can simulate it using small sleeps and checking the token.
var totalBackoffUsed = 0;
while (totalBackoffUsed < backOffInSec && !cancellationToken.IsCancellationRequested)
{
Thread.Sleep(TimeSpan.FromSeconds(1));
totalBackoffUsed += 1;
}
While investigating the RetryHandler
issue, we came across this awkward bit of error handling.
public override void Open()
{
SetSession();
try
{
SfSession!.Open();
}
catch (Exception e)
{
// Otherwise when Dispose() is called, the close request would timeout.
m_ConnectionState = ConnectionState.Closed;
if (!(e is SnowflakeDbException))
throw new SnowflakeDbException(e, SnowflakeDbException.CONNECTION_FAILURE_SSTATE, SFError.INTERNAL_ERROR, "Unable to connect. " + e.Message);
else
throw;
}
m_ConnectionState = ConnectionState.Open;
}
public override Task OpenAsync(CancellationToken cancellationToken)
{
RegisterConnectionCancellationCallback(cancellationToken);
SetSession();
return SfSession!.OpenAsync(cancellationToken).ContinueWith(
previousTask =>
{
if (previousTask.IsFaulted)
{
// Exception from SfSession.OpenAsync
var sfSessionEx = previousTask.Exception!;
m_ConnectionState = ConnectionState.Closed;
throw new SnowflakeDbException(sfSessionEx, SnowflakeDbException.CONNECTION_FAILURE_SSTATE, SFError.INTERNAL_ERROR, "Unable to connect");
}
else if (previousTask.IsCanceled)
{
m_ConnectionState = ConnectionState.Closed;
}
else
{
// Only continue if the session was opened successfully
m_ConnectionState = ConnectionState.Open;
}
},
cancellationToken);
}
In Open
, the code's intent would be much clearer if multiple catch blocks were uses.
public override void Open()
{
SetSession();
try
{
SfSession!.Open();
}
catch (SnowflakeDbException)
{
m_ConnectionState = ConnectionState.Closed;
throw;
}
catch (Exception e) when (e is not SnowflakeDbException)
{
// Otherwise when Dispose() is called, the close request would timeout.
m_ConnectionState = ConnectionState.Closed;
throw new SnowflakeDbException(e, SnowflakeDbException.CONNECTION_FAILURE_SSTATE, SFError.INTERNAL_ERROR, "Unable to connect. " + e.Message);
}
m_ConnectionState = ConnectionState.Open;
}
The OpenAsync
function uses a really old style of working with tasks. This style was appropriate around the .NET 4 era, but shouldn't be used in modern code.
You can see here how using async/await makes the code nearly identical to the sync version.
public override async Task OpenAsync(CancellationToken cancellationToken)
{
RegisterConnectionCancellationCallback(cancellationToken);
SetSession();
try
{
await SfSession!.OpenAsync(cancellationToken).ConfigureAwait(false);
}
catch (SnowflakeDbException)
{
m_ConnectionState = ConnectionState.Closed;
throw;
}
catch (TaskCanceledException) when (cancellationToken.IsCancellationRequested)
{
m_ConnectionState = ConnectionState.Closed;
throw;
}
catch (Exception ex)
{
// Otherwise when Dispose() is called, the close request would timeout.
m_ConnectionState = ConnectionState.Closed;
throw new SnowflakeDbException(ex, SnowflakeDbException.CONNECTION_FAILURE_SSTATE, SFError.INTERNAL_ERROR, "Unable to connect. " + ex.Message);
}
m_ConnectionState = ConnectionState.Open;
}
- Add
#nullable enable
to each file in the namespace. Update properties where appropriate. - Change
ParseChunk
toParseChunkAsync
. Add a real sync version namedParseChunk
. - Change callers of
ParseChunkAsync
to useParseChunk
when appropriate. - Use
var
where appropriate - Assume any null checks on messages that don't currently exist aren't needed. (Long term, appropriate null checks should be added.)
- Lined up parameters in method signatures and calls
- Mark classes static when they only contain static methods.
- Removed unnecessary Task.Run in
SFBlockingChunkDownloader.DownloadChunkAsync
- Mark fields as readonly where possible
- Add missing
.ConfigureAwait(false)
- Fix casing on public properties in
SFResultChunk
- Add
#nullable enable
to each file in the namespace. Update properties where appropriate. - Use
var
where appropriate - Mark fields as readonly where possible
- Fix casing in parameter names
When a required field or property is null, we throw a InvalidOperationException
to alert the caller that the object isn't capable of completing the method call. For example,
if (Url == null)
throw new InvalidOperationException($"{nameof(Url)} is null");
This method was found in SFResultSetMetaData. It isn't used and can be removed.
internal DataTable toDataTable()
{
return null;
}
- Add
#nullable enable
to each file in the namespace. Update properties where appropriate. - Lined up parameters in method signatures and calls
- Change public fields into public properties
- Assume any null checks on messages that don't currently exist aren't needed. (Long term, appropriate null checks should be added.)
- Remove unused fields.
While singletons are a well established pattern, they shouldn't be used indiscriminetly. The class HttpUtil
implements no interfaces. Nor does it have a base class. So there is no reason to make it an object at all. It can instead be treated as any other static class.
In these two lines, the variable timeoutInSec
has a value assigned to it. But before it is read, the value is over-written. So the assignment should be removed to improve clarity. This in turn means recommendedMinTimeoutSec
can be deleted entirely.
int recommendedMinTimeoutSec = RestRequest.DEFAULT_REST_RETRY_SECONDS_TIMEOUT;
int timeoutInSec = recommendedMinTimeoutSec;
The SFSessionProperties
class overrides Equals
without overriding GetHashCode
.
To supress the compiler warning, a fake override was created. To make it more obvious what's happening, a #pragma warning disable
will be used instead.
public override int GetHashCode()
{
return base.GetHashCode();
}
- Add
#nullable enable
to each file in the namespace. Update properties where appropriate. - Lined up parameters in method signatures and calls
- Standardize on C# names for types. For example,
int
vsInt32
.
This internal class is only used by the tests, so it should be moved into the test project.
Use string interpolation instead of string.Format
where appropriate. In most cases, the string interpolation will be easier to read.
string loginTimeOut5sec = String.Format(ConnectionString + "connection_timeout={0}", timeoutSec);
string loginTimeOut5sec = $"{ConnectionString}connection_timeout={timeoutSec}";
Once this change is made, it becomes more apparent that there is a ;
missing in the original code.
string loginTimeOut5sec = $"{ConnectionString};connection_timeout={timeoutSec}";
Now that all of the classes have been update to be null aware, the project can be marked with ``. This in turn means that #nullable enable
can be removed from all of the individual files.
All classes, interfaces, enums, etc. in the Core
namespace are changed from public to internal. None of these are meant to be accessed directly by users of the library.
Enable CodeMaid's Format on Save feature. Apply formatting to all files.
Compiler messages are less significant than warnings. Often these are stylistic choices such as whether or not to use var
. Occasionally, however, they do reflect a more serious issue. So it is best to clear them from time to time by either fixing the 'issue', suppressing the message, or reconfiguring the analyzer’s settings. In this project, those settings are stored in the .editorconfig
file.
The types of issues caught in this pass included…
- Use
var
where possible. - Use target typed
new
where possible - Mark fields as
readonly
where possible - Fix spelling errors
- Use initializers
- Capitalize methods correctly
- Capitalize properties correctly. (Use
JsonProperty
in cases where the class may be serialized, as serialization may be case sensitive.) - Fields that are never read
- Methods that can be marked as
static
- Use
nameof(…)
when referring to parameter names - Replace string literals with char literals
While adding readonly
, it was noticed that INJECT_WAIT_IN_PUT
could instead be marked as a constant. Once that was done, the compiler was able to detect that a Thread.Sleep
call could never actually occur. This demonstrates the advantage of being as strict as possible when using readonly
and const
.
const int INJECT_WAIT_IN_PUT = 0;
if (INJECT_WAIT_IN_PUT > 0)
Thread.Sleep(INJECT_WAIT_IN_PUT);
One of the messages indicated that the magicBytes
field was read, so it could be removed. This in turn meant the matching constructor parameter can be removed.
The ?.
syntax can be used to remove explicit null checks.
return v == null ? null : v.ToString();
return v?.ToString();
Pattern matching can also be used to remove a null check.
var parameter = value as SnowflakeDbParameter;
return parameter != null && m_ParameterList.Contains(parameter);
return value is SnowflakeDbParameter parameter && m_ParameterList.Contains(parameter);
- Move classes into individual files.
- Rename
SFError
toSnowflakeError
for consistency with other type names. - Change assertions to use enums instead of hard-coded numbers
Assert.AreEqual(270053, e.ErrorCode);
Assert.AreEqual(SnowflakeError.UnsupportedDotnetType, e.SnowflakeError);
Classes with only static members such as ChunkDownloaderFactory
should be marked as static
.
Allowing objects of these types to be created is misleading to the developers, as those objects would not be useful for anything.
The previous round of refactoring revealed that there are over 20 static utility classes. This is probably too many. While utility classes such as System.Math
make a lot of sense, such a high count for one library suggests that there are some organizational issues.
Some of the static classes have methods that are only invoked from one place. For example, ChunkDownloaderFactory.GetDownloader
is only called by SFResultSet
. And it takes a SFResultSet
as a parameter.
Clearly the GetDownloader
belongs inside SFResultSet
. And once there, it can be turned into a private instance method.
The AuthenticatorFactory.GetAuthenticator
method is another example of this. Though it is used twice, in both cases it is being called from the same object, with that object passing itself as a parameter.
Fundamentally, there is nothing wrong with a static function taking a this
argument. That's basically what a method is. It's just that we call the parameter this
and put the matching argument on the left side of the .
.
But from an organizational standpoint, it is preferable to put data and operations that act on that data together. That's the philosophical basis for object-oriented programming.
Extension methods are wonderful for adding functionality to a class from a separate library. Or for adding functionality to types that can't have methods such as enum
. They also make sense with the this
parameter could be null such as string.IsNullOrEmpty(string?)
.
They shouldn't be used when the function could be easily changed into a normal method. An example of this is ResultSetUtil.CalculateUpdateCount
, which we will move into SFBaseResultSet
.
The ChunkParserFactory.GetParser
method is used to return instances of an IChunkParser
. The two are so closely related that it doesn't really make sense to have them in distinct files. Whenever you are working with IChunkParser
, you need to know about ChunkParserFactory.GetParser
.
But they can't be together under the current design because interfaces like IChunkParser
can't have static methods (yet).
What we can do is change IChunkParser
into a base class. And then we can move GetParser
into it.
Consider this scenario. The static utility class SFLocalStorageUtil
is used by SFFileTransferAgent
. Can you match the descriptions with the class names?
- "Class responsible for uploading and downloading files to the remote client."
- "The storage client for local upload/download."
They both have the terms "client", "upload", and "download". At first glance it appears that they are the same class, divided into two parts.
But wait, there is also SFRemoteStorageUtil
. This is a far more complicated set of functions. But an interesting feature is that it shares 2 method signatures with SFLocalStorageUtil
. So it appears that the relationship between the two utility classes is wrong.
Looking how it is used, we can see the problem.
if (StorageClientType.REMOTE == GetStorageClientType(m_TransferMetadata.StageInfo!))
// Upload the file using the remote client SDK and the file metadata
SFRemoteStorageUtil.UploadOneFileWithRetry(fileMetadata);
else
// Upload the file using the local client SDK and the file metadata
SFLocalStorageUtil.UploadOneFileWithRetry(fileMetadata);
The classes SFLocalStorageUtil
and SFRemoteStorageUtil
have a polymorphic relationship, but aren't treated as such. We are using this if
statement to simulate virtual dispatch. Here are the steps:
- Create a base class called
SFStorage
. - Rename
SFLocalStorageUtil
toSFLocalStorage
andSFRemoteStorageUtil
toSFRemoteStorage
. - Change
SFLocalStorage
andSFRemoteStorage
to inherit fromSFStorage
. - Make
UploadOneFileWithRetry
andDownloadOneFile
into abstract methods inSFStorage
and update the subclasses to match. - Give
SFLocalStorage
andSFRemoteStorage
a staticInstance
field and private constructor. - Change
SFFileTransferAgent.GetStorageClientType
to return aSFStorage
subclass.
The static classes HttpUtil
and Utilities
both deal with HttpClient
related classes, so they should be merged.
We could also merge them with the HttpClientUtilities
classes. But those are generic enough that we'll be pulling time into their own library in the future.
From over 20 static classes, we're down to 11 library specific static classes, 3 static classes that will be moved to a future library, and 1 static class for legacy .NET support.
Consider this line:
static readonly Dictionary<string, HttpClient> s_HttpClients = new();
Because it is a mutable collection in a static field, we have to look around to see if it is being used correctly. If it's not, there could be a race condition that breaks the dictionary such as subsequent calls to it will always fail.
One way to protect it is with a lock around s_HttpClients
? At first glance it doesn’t look like it.
static HttpClient RegisterNewHttpClientIfNecessary(HttpClientConfig config)
{
var name = config.ConfKey;
if (!s_HttpClients.ContainsKey(name))
{
var httpClient = new HttpClient(new RetryHandler(setupCustomHttpHandler(config)))
{
Timeout = Timeout.InfiniteTimeSpan
};
// Add the new client key to the list
s_HttpClients.Add(name, httpClient);
}
return s_HttpClients[name];
}
But actually, RegisterNewHttpClientIfNecessary
is currently called only in one place. At that place does have a lock.
internal static HttpClient GetHttpClient(HttpClientConfig config)
{
lock (s_HttpClientProviderLock)
{
return RegisterNewHttpClientIfNecessary(config);
}
}
This is not acceptable. Every future developer who looks at this code is going to have to review it for potential race conditions. They can’t just look at it any one place to understand whether or not it’s written correctly.
Fortunately, there is an easy alternative that uses a ConcurrentDictionary
which eliminates the lock and private function.
internal static HttpClient GetHttpClient(HttpClientConfig config)
{
var name = config.ConfKey ?? throw new ArgumentException($"{nameof(config.ConfKey)} is null", nameof(config));
return s_HttpClients.GetOrAdd(name, _ => new HttpClient(new RetryHandler(setupCustomHttpHandler(config))) { Timeout = Timeout.InfiniteTimeSpan });
}
Taking a more holistic look at HttpUtil
, one has to ask why it is a singleton class rather than just a static module? It implements no interfaces and is sealed, so there are no opportunities for polymorphism. The constructor is private, so there will never be more than one instance.
Looking further, it is only used by SFSession
and is specially designed for that class’s needs. It isn’t a generic utility class at all. Furthermore, it is at the heart of what SFSession
, which is managing HTTP connections.
So this should have been moved into SFSession
in the previous round of refactoring, leaving just GetOptionOrDefault
and SetOption
on their own.
Once in SFSession
, the GetHttpClient
method can be inlined. We'll leave its helper function, SetupCustomHttpHandler
, alone as it is quite long and complex.
The C# compiler has several "waves" of compiler warnings that are disabled by default. These should be enabled incrementally to avoid overwhelming the developer. Once caught up, you can use <AnalysisLevel>latest-all</AnalysisLevel>
to ensure new warnings are seen.
In order to avoid unexpected runtime errors when running on non-English systems, any code that deals with string comparisons should indicate the type of comparison to be made. Usually this means Ordinal
, OrdinalIgnoreCase
, or InvariantCulture
.
When given the option, the C# compiler prefers ordinal over invariant.
According to CA1813, performance can be improved by marking attributes as sealed
.
Several warnings deal with exceptions.
- CA2237 Add
[Serializable]
toSnowflakeDbException
as this type implementsISerializable
- CA1032 Add the following constructor to
SnowflakeDbException
:public SnowflakeDbException()
- CA1032 Add the following constructor to
SnowflakeDbException
:public SnowflakeDbException(string message)
- CA1032 Add the following constructor to
SnowflakeDbException
:public SnowflakeDbException(string message, Exception innerException)
We can ignore the first one because we don't intend to support the depricated .NET 1 style of serialization. It only implements ISerializable
via a base class.
For the others, that's trickier. We really do want someone to always provide a SnowflakeError
value, so we're going to have to ignore this advice.
To avoid unexpected runtime behavior, TaskScheduler.Default
needs to be explicitly specified. Otherwise, it will use the current TaskScheduler
, which may not be appropriate for the situation.
The exception NullReferenceException
should not be explicitly thrown. Instead, we should use ArgumentNullException
or InvalidOperationException
as appropriate.
By convention, all enums should have a None
value.
In many locations, IDisposable.Dispose
was not being invoked.
In some cases this is a false positive because the type doesn't really need to be disposed. In other cases, the holder for the object needs to be marked as IDisposable
itself.
Some await
calls are missing .ConfigureAwait(false)
. This is important because there are times when not having it can lead to deadlocks.
Ensure protected override void Dispose(bool disposing)
always calls the base class's dispose method.
When calling an async method, and we have a CancellationToken
, we should provide said token to the method.
However, many legacy .NET methods don't accept a CancellationToken
. Which means we need to polyfill those methods. For example,
#if !NET6_0_OR_GREATER
public static Task CopyToAsync(this HttpContent content, Stream stream, TransportContext? context, CancellationToken cancellationToken)
{
cancellationToken.ThrowIfCancellationRequested();
return content.CopyToAsync(stream, context);
}
This won't really work right for legacy frameworks, but if the user upgrades to .NET 6 or later then the functionality will "light up".
Even though we are using Nullable Reference Types, the user of our library may not. So public methods should still check for null arguments.
Fields shouldn't be explicitly initialized to their default values. This has a (incredibly minor) performance cost and is redundant.
int m_Next = 0;
For encryption, use System.Security.Cryptography.RandomNumberGenerator
to generate random numbers.
Classes that implement ICollection and/or IEnumerable should also implement ICollection<T>
and IEnumerable<T>
. The type-safe versions of the interfaces will be more useful to consumers.
If a Get method is simple such as the one below, it should be a property instead.
public string? GetQueryId() => m_ResultSet.m_QueryId;
Where possible, use const
instead of readonly
. This makes the intention clearer as a const
cannot be override in a constructor.
While working through this, the naming conventions for the SFDataType
enum was corrected.
This was a little tricky, as it did require adding a two-way map between the C# names and the names needed for SQL generation. The way it had been implemented was to assume that the C# name would be the SQL name. For example, SFDataType.TIMESTAMP_LTZ
translates to TIMESTAMP_LTZ
in SQL.
Rather than fighting with every future developer who wants to 'fix' it, we made these changes:
- Renamed the enum values to use the .NET naming convention such as
SFDataType.TimestampLtz
. - Removed any calls to
SFDataType.ToString()
orEnum.Parse<SFDataType>
. - Added
FromSql
andToSql
functions.
There is a general rule that in libraries, you shouldn't catch System.Exception
. The idea is that if you are actually handling the error properly, you must know the specific exception up-front.
In the real world, that doesn't work. Some of the reasons why we may want to catch the generic exception include...
- Wrapping the exception with additional details.
- Returning the error state in a way other than by using exceptions
- Discarding the exception because we don't care why the operation failed, only that it did. (This was very common before TryParse methods were created.)
- Discarding the exception because it is unsafe to report it. For example, a
Dispose
call, which may be invoked inside afinally
block.
Due to the high number of false positives, this warning was disabled completely.
The full name of the connection class is Tortuga.Data.Snowflake.SnowflakeDbConnection
. That's a bit long and redundant. In some places, the type name is abbreviated as simply "SF". So we would have Tortuga.Data.Snowflake.SFDbConnection
.
At the same time we can drop the Db
part, as no other ADO.Net library seems to us that pattern. This further reduces it to Tortuga.Data.Snowflake.SFConnection
, which is a bit more reasonable for a frequently used class.
- Update packages
- Create new NuGet package
- Run all tests in Release mode
When updating packages, we discovered that Google.Cloud.Storage.V1
version 4 dropped support for .NET Standard 2.0. So for that one target, we have to roll back to Google.Cloud.Storage.V1
version 3.7.