Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation for minTTL #6808

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft

Conversation

hkiiita
Copy link
Member

@hkiiita hkiiita commented Nov 13, 2024

This PR introduces a minTTL setting which would help address the problem of applications caching DNS response IPs indefinitely. Cluster administrators should be able to configure this value, ideally setting it to be equal to or greater than the maximum TTL value of the application's DNS cache.

This feature is a work towards resolving the issue of indefinite caching of DNS response IPs by certain applications.

// The minTTL setting helps address the problem of applications caching DNS response IPs indefinitely.
// The Cluster administrators should configure this value, ideally setting it to be equal to or greater than the maximum TTL
// value of the application's DNS cache.
MinTTL uint32 `yaml:"minTTL,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We avoid unsigned integers in the config as far as I can tell, so maybe use int (personally I think int32 would be better, but we do use int consistently in the config apparently, so probably best to stick to int).

build/yamls/antrea.yml Outdated Show resolved Hide resolved
Comment on lines 608 to 609
// If minTTL is greater than 1, it indicates that the value has been set externally by a cluster admin, and should be respected.
if o.config.MinTTL > 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 1 and not 0?

Comment on lines 309 to 311
# The minTTL setting helps address the problem of applications caching DNS response IPs indefinitely.
# The Cluster administrators should configure this value, ideally setting it to be equal to or greater than the maximum TTL
# value of the application's DNS cache.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove "indefinitely". If the application does indeed cache the DNS entry forever, there is not much we can do.
We should also mention that this is for FQDN policy enforcement.
So maybe something like this:

The minTTL setting helps address the problem of applications caching DNS response IPs beyond the TTL value for the DNS record.
It is used to enforce FQDN policy rules, ensuring that resolved IPs are included in datapath rules for as long as the application is caching them.
This value should ideally be set to the maximum caching duration across all applications.

@@ -90,6 +90,7 @@ type Options struct {
nplEndPort int
dnsServerOverride string
nodeType config.NodeType
minTTL uint32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this field doesn't seem necessary?

Comment on lines 641 to 647
getMaxTTL := func(ttl1, ttl2 uint32) uint32 {
if ttl1 > ttl2 {
return ttl1
} else {
return ttl2
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a max built in in Golang: https://pkg.go.dev/builtin#max
It was introduced in Go 1.21

currentTime := f.clock.Now()
for _, ans := range msg.Answer {
switch r := ans.(type) {
case *dns.A:
if f.ipv4Enabled {
responseIPs[r.A.String()] = ipWithExpiration{
ip: r.A,
expirationTime: currentTime.Add(time.Duration(r.Header().Ttl) * time.Second),
expirationTime: currentTime.Add(time.Duration(getMaxTTL(f.minTTL, r.Header().Ttl)) * time.Second),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like technically, minTTL should only apply to DNS responses which were not initiated by Antrea, but intercepted by Antrea (responses to DNS queries generated by the application). However, this code applies to responses to DNS queries sent by Antrea (when an override DNS server is configured). Maybe we need to introduce a flag to distinguish between the 2 cases?

cc @tnqn @Dyanngg

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually spent the length of last syncup meeting to discuss this, because I had the same suggestion but @tnqn’s opinion was that we don’t have to differentiate between the two cases. For security purpose we advise users to use FQDN rules only in allowlists for Antrea-native policies, and he thinks it’s okay that the clients’ TTL for a FQDN goes “out of sync” with the antrea agent since that’s not the gaurentee we want: we only want to enforce that client cannot access unintended addresses. So having an address for a domain which has longer TTL in antrea cache compared to the client is ok. I’ll let Quan chime in to see if I’m summarizing this correctly, but the end result of the discussion was we told Hemant to not worry about differentiating Antrea and client initiated dns queries

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For security purpose we advise users to use FQDN rules only in allowlists for Antrea-native policies

Got it, makes sense to me

@hkiiita please ignore this comment

- Remove the redundant minTTL field from Options struct.
- Use in built max function for comparison of max TTL values.
- Improve descriptive comment about minTTL in config file.

Signed-off-by: Hemant <[email protected]>
# The minTTL setting helps address the problem of applications caching DNS response IPs beyond the TTL value for the DNS record.
# It is used to enforce FQDN policy rules, ensuring that resolved IPs are included in datapath rules for as long as the application is caching them.
# This value should ideally be set to the maximum caching duration across all applications.
minTTL: {{ .Values.minTTL }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name is a bit generic, maybe we should go with fqdnCacheMinTTL?

Comment on lines 608 to 609
// Ensure that the minTTL is not negative.
o.config.MinTTL = max(o.config.MinTTL, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be better to return an error if o.config.MinTTL < 0 IMO

while this code is in validateK8sNodeOptions, I am not sure this configuration parameter is specific to the "K8sNode" case. Maybe the check should be in the parent function? cc @tnqn

@@ -160,7 +161,7 @@ type fqdnController struct {
clock clock.Clock
}

func newFQDNController(client openflow.Client, allocator *idAllocator, dnsServerOverride string, dirtyRuleHandler func(string), v4Enabled, v6Enabled bool, gwPort uint32, clock clock.WithTicker) (*fqdnController, error) {
func newFQDNController(client openflow.Client, allocator *idAllocator, dnsServerOverride string, dirtyRuleHandler func(string), v4Enabled, v6Enabled bool, gwPort uint32, clock clock.WithTicker, minTTL int) (*fqdnController, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if minTTL should be uint32 here. This way we know it is a positive number. The conversion from int to uint32 can happen in cmd/antrea-agent.

# It is used to enforce FQDN policy rules, ensuring that resolved IPs are included in datapath rules for as long as the application is caching them.
# This value should ideally be set to the maximum caching duration across all applications.
minTTL: {{ .Values.minTTL }}
fqdnCacheMinTTL: {{ .Values.minTTL }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fqdnCacheMinTTL: {{ .Values.minTTL }}
fqdnCacheMinTTL: {{ .Values.fqdnCacheMinTTL }}

This is why the manifests are not generated correctly (fqdnCacheMinTTL: instead of fqdnCacheMinTTL: 0)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry , my bad. Will correct that.

Comment on lines 163 to 171
switch o.config.NodeType {
case config.ExternalNode.String():
o.nodeType = config.ExternalNode
return o.validateExternalNodeOptions()
} else if o.config.NodeType == config.K8sNode.String() {
case config.K8sNode.String():
o.nodeType = config.K8sNode
return o.validateK8sNodeOptions()
} else {
default:
return fmt.Errorf("unsupported nodeType %s", o.config.NodeType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a bad change, but I would avoid doing it in this PR as it is unrelated

if o.config.NodeType == config.ExternalNode.String() {
// validate FqdnCacheMinTTL
if o.config.FqdnCacheMinTTL < 0 {
return fmt.Errorf("fqdnCacheMinTTL set to an invalid value, its must be a positive integer")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return fmt.Errorf("fqdnCacheMinTTL set to an invalid value, its must be a positive integer")
return fmt.Errorf("fqdnCacheMinTTL must be greater than or equal to 0")

@@ -158,7 +158,7 @@ type AgentConfig struct {
// The minTTL setting helps address the problem of applications caching DNS response IPs indefinitely.
// The Cluster administrators should configure this value, ideally setting it to be equal to or greater than the maximum TTL
// value of the application's DNS cache.
MinTTL int `yaml:"minTTL,omitempty"`
FqdnCacheMinTTL int `yaml:"fqdnCacheMinTTL,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the field name here should be FQDNCacheMinTTL per our conventions

Copy link
Contributor

@Dyanngg Dyanngg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also add unit test cases to verify that with fqdnController with minTTL set will set correct dnsEntry expiration times in cache

@@ -306,6 +306,11 @@ kubeAPIServerOverride: {{ .Values.kubeAPIServerOverride | quote }}
# 10.96.0.10:53, [fd00:10:96::a]:53).
dnsServerOverride: {{ .Values.dnsServerOverride | quote }}

# The fqdnCacheMinTTL setting helps address the problem of applications caching DNS response IPs beyond the TTL value for the DNS record.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/The fqdnCacheMinTTL setting helps address/fqdnCacheMinTTL helps address

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hkiiita not addressed correctly, the current sentence is not grammatically correct

@@ -744,3 +747,140 @@ func TestOnDNSResponse(t *testing.T) {
})
}
}
func TestFQDNCacheMinTTL(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like in theory we only need to test parseDNSResponse, because this is the only place where minTTL is actually used. We just need to make sure that minTTL can override the TTL included in the DNS response. cc @Dyanngg

However, I don't feel super strongly about it, so if others think it is better to test onDNSResponseMsg "end-to-end", it is fine by me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am awaiting a comment on this one to implement the nit change asked above in previous comment , moreover, i had also pushed the commit below refactor to test just the parseDNSResponse considering the feedback.

name: "Response TTL less than FQDNCacheTTL",
expectedTTL: currentTime.Add(10 * time.Second),
fqdnCacheMinTTL: 10,
dnsMsg: &dns.Msg{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of repetition: we use the same DNS message every time as far as I can tell (with the only change being the ttl value).
One option is to use a closure such as this one:

getDNSMsg := func(ttl in) *dns.Msg {
        return &dns.Msg{ ... }
}

fqdnCacheMinTTL uint32
dnsMsg *dns.Msg
}{
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should test the IPv6 case (dns.AAAA) as well, just because it is a different code path

fakeClock := newFakeClock(currentTime)
controller := gomock.NewController(t)
f, _ := newMockFQDNController(t, controller, nil, fakeClock, tc.fqdnCacheMinTTL)
require.Zero(t, fakeClock.TimersAdded())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this useful in the context of this test? Please add a short comment if you think it is necessary.

f, _ := newMockFQDNController(t, controller, nil, fakeClock, tc.fqdnCacheMinTTL)
require.Zero(t, fakeClock.TimersAdded())
_, responseIPs, err := f.parseDNSResponse(tc.dnsMsg)
assert.NoError(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one should be require.NoError

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants