Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

soundwire: add higher bus clock rate support #5160

Open
wants to merge 11 commits into
base: topic/sof-dev
Choose a base branch
from

Conversation

bardliao
Copy link
Collaborator

This reverts commit f2fc74f. To test without this commit. It seems work on my new LNL device. I would like to test with CI to see what the failure scenario is.

@plbossart
Copy link
Member

if you select the 9.6 MHz clock, you also need to change the default_col_size to 8 instead of 4. Otherwise the frame size will not be correct.

@bardliao bardliao force-pushed the test-different-sdw-clock branch 2 times, most recently from 071461c to feca0c2 Compare August 29, 2024 05:43
if (curr_dr_freq == 19200000)
mstr_prop->default_col = 8;
else
mstr_prop->default_col = 4;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plbossart Do you have a rule of selecting frame shape? Currently, I use a hack to test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to start from the root frequency.
19.2: default_row = 50
12: default_row = 125
12.288: default_row = 64

Then you need to take the bus frequency, multiply it by to to get actual DDR rate, divide it by the frame rate and again by default row and you get the default_col.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to start from the root frequency. 19.2: default_row = 50 12: default_row = 125 12.288: default_row = 64

Then you need to take the bus frequency, multiply it by to to get actual DDR rate, divide it by the frame rate and again by default row and you get the default_col.

Changed to default_col = curr_dr_freq / m_rt->stream->params.rate / mstr_prop->default_row;

@bardliao bardliao changed the title Revert "soundwire: intel_auxdevice: start the bus at default frequency" [RFC] soundwire: add higher bus clock rate support Sep 2, 2024
drivers/soundwire/stream.c Outdated Show resolved Hide resolved
drivers/soundwire/stream.c Outdated Show resolved Hide resolved
__func__, curr_dr_freq);
i++;
goto select_clk;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why you would only increase the clock when there is a single lane. The clock should be increased even in the multi-lane case when there's no bandwidth left?

I must be missing something.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is about the priority of using multi-lane or higher clock rate. At this point, we tried multi-lane but can't find an available lane and it will return -EINVAL if it reaches the highest clock rate. But yeah, I didn't consider about using multi-lane with higher clock rate. I assume that we always have enough bandwidth if we can find an extra lane. Let me redo this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, lanes should be used before cranking the speed up, but an error should only be thrown when multilane+higher speed combined doesn't support the required bandwidth.

* frequency base and scale registers are required for SDCA
* devices. They may also be used for 1.2+/non-SDCA devices.
* Driver can set the property, we will need a DisCo property
* to discover this case from platform firmware.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not able to follow this comment, consider rewording.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just copy and paste from the existing code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe clarify the comment

"Driver can set the property directly, for now there's no DisCo property to discover support for the scaling registers from platform firmware'

@bardliao bardliao force-pushed the test-different-sdw-clock branch 2 times, most recently from 28b5981 to cf9ed92 Compare September 10, 2024 06:49
@bardliao
Copy link
Collaborator Author

On the second thought, I think the existing code partially support dynamic clock change already. What missed is change frame shape dynamically. So, it will be better if we add the dynamic frame shape support commit first and multi-lane support commits late. It is more reviewable, and the logic is clearer.

drivers/soundwire/bus.c Show resolved Hide resolved
* frequency base and scale registers are required for SDCA
* devices. They may also be used for 1.2+/non-SDCA devices.
* Driver can set the property, we will need a DisCo property
* to discover this case from platform firmware.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe clarify the comment

"Driver can set the property directly, for now there's no DisCo property to discover support for the scaling registers from platform firmware'

drivers/soundwire/stream.c Outdated Show resolved Hide resolved
/* Check if default_col can be changed */
list_for_each_entry(s_rt, &m_rt->slave_rt_list, m_rt_node) {
if (!s_rt->slave->id.class_id) {
dev_err(&s_rt->slave->dev, "The Peripheral doesn't comply with SDCA\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably need a comment to explain that some devices expose the class ID but can't support dynamic scaling.

I would actually move this loop to a helper that can be modified/quirked if needed, e.g. is_clock_scaling_supported().

drivers/soundwire/generic_bandwidth_allocation.c Outdated Show resolved Hide resolved
frame_int = mstr_prop->default_row * mstr_prop->default_col;
frame_freq = curr_dr_freq / frame_int;

if ((curr_dr_freq - (frame_freq * SDW_FRAME_CTRL_BITS)) <
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is partially wrong, because we never use column0 for audio. so for a 50 row frame, this calculation would be off by 2 bits times the frame_freq. I think you need to use curr_dr_freq - default_row * frame_freq.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copy from sdw_select_row_col(). Do we need to do the same change in sdw_select_row_col(), too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think so. I don't think anyone can use column0 with 19.2 or 12.228 root frequencies. It's only possible for the 125-row frame but that's such a corner case.

But the formula could be simplifed to avoid a division btw:

curr_dr_freq - (curr_dr_freq / (sdw_rows[r] * sdw_cols[c]) * sdw_rows[r])

curr_d_freq - curr_dr_freq / sdw_cols[c] < bandwidth
curr_d_freq * (sdw_cols[c] - 1) < bandwidth * sdw_cols[c]

return ret;
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clocks are always updated on prepare/deprepare, I am not sure why you want to remove this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we really need to update clocks on both prepare/deprepare. We update clock in sdw_compute_bus_params(), but we still reserve the bit slots for the deprepared stream in sdw_compute_port_params(). That doesn't look correct to me.
This is to see if any issue will happen if I remove this. Looks like CI has told me that compute_params() can not be removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me re-explain. We have two bank switches, one on prepare and one on trigger. This is intentional, we want to go to the 'new' frequency and let the clock toggle for some time before enabling all the channels on the trigger.

So the point is that we do need to update the parameters and reduce the clock on deprepare. I am not sure why the prepare and de-prepare parts would become asymmetrical?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me re-explain. We have two bank switches, one on prepare and one on trigger. This is intentional, we want to go to the 'new' frequency and let the clock toggle for some time before enabling all the channels on the trigger.

So the point is that we do need to update the parameters and reduce the clock on deprepare.

Agree, but m_rt is removed after deprepared and list_for_each_entry(m_rt, &bus->m_rt_list, bus_node) is used by _sdw_compute_port_params(), sdw_compute_group_params(), etc. As a result, the bit slots of the deprepared stream are still reserved. If we reduce the clock, we will get a no sufficient bandwidth error.

I am not sure why the prepare and de-prepare parts would become asymmetrical?

I don't think it is asymmetrical. Below is the sequence when we start and stop aplay.
intel_hw_params
-> sdw_stream_add_master
asoc_sdw_prepare
-> sdw_prepare_stream

asoc_sdw_hw_free
-> sdw_deprepare_stream
intel_hw_free
-> sdw_stream_remove_master

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not following this. Do we agree when the stream is deprepared, it should use exactly zero bitslots? I don't understand why the bits would remain reserved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the code above still accounts for the bandwidth of the stream, so the stream is still taken into account.

It makes no sense to me to reduce the bandwidth count after performing the bandwidth allocation.

Sorry, completely lost here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the bandwidth reduction count after performing the bandwidth allocation is based on the fact that the m_rt has not been removed when we allocate the bit slots. To fix the issue, I can either move the bandwidth reduction count after performing the bandwidth as the current PR does or remove the m_rt before allocating the bit slots.
I choose the former because removing m_rt first seems break the add m_rt -> prepare -> deprepare ->remove m_rt sequence.
I know the side effects of moving the bandwidth reduction count after performing the bandwidth allocation is that we may keep the sdw bus clock or lane that is not needed. But it will happen only when we reach the threshold that need higher clock rate or more lane. So, I think it is acceptable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a bit convoluted really.

The bits for the stream should no longer be accounted for after this routine:

int asoc_sdw_hw_free(struct snd_pcm_substream *substream)
{
	struct snd_soc_pcm_runtime *rtd = snd_soc_substream_to_rtd(substream);
	struct sdw_stream_runtime *sdw_stream;
	struct snd_soc_dai *dai;

	/* Find stream from first CPU DAI */
	dai = snd_soc_rtd_to_cpu(rtd, 0);

	sdw_stream = snd_soc_dai_get_stream(dai, substream->stream);
	if (IS_ERR(sdw_stream)) {
		dev_err(rtd->dev, "no stream found for DAI %s\n", dai->name);
		return PTR_ERR(sdw_stream);
	}

	return sdw_deprepare_stream(sdw_stream);
}

but the problem is that we have a half-baked dai handling and comments

static int
intel_hw_free(struct snd_pcm_substream *substream, struct snd_soc_dai *dai)
{
	struct sdw_cdns *cdns = snd_soc_dai_get_drvdata(dai);
	struct sdw_intel *sdw = cdns_to_intel(cdns);
	struct sdw_cdns_dai_runtime *dai_runtime;
	int ret;

	dai_runtime = cdns->dai_runtime_array[dai->id];
	if (!dai_runtime)
		return -EIO;

	/*
	 * The sdw stream state will transition to RELEASED when stream->
	 * master_list is empty. So the stream state will transition to
	 * DEPREPARED for the first cpu-dai and to RELEASED for the last
	 * cpu-dai.
	 */
	ret = sdw_stream_remove_master(&cdns->bus, dai_runtime->stream);
	if (ret < 0) {
		dev_err(dai->dev, "remove master from stream %s failed: %d\n",
			dai_runtime->stream->name, ret);
		return ret;
	}

That comment is wrong, the stream state will transition to DEPREPARED in the dailink hw_free(), and then to RELEASED when the last DAI hw_free() is called

That still doesn't really tell me why the bits are not released during the sdw_deprepare_stream() part. There should be no dependency on the m_rt list.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asoc_sdw_hw_free() is called before intel_hw_free(). IOW, sdw_deprepare_stream() is called before sdw_stream_remove_master(). So, m_rt is not removed yet when sdw_deprepare_stream() is called.

_sdw_deprepare_stream
-> bus->compute_params(bus)
-> sdw_compute_params()
-> sdw_compute_bus_params(bus)
-> sdw_compute_port_params(bus)

                        port_bo = 1;

                        list_for_each_entry(m_rt, &bus->m_rt_list, bus_node) {
                                sdw_compute_master_ports(m_rt, &params[i], &port_bo, hstop);
                        }

                        hstop = hstop - params[i].hwidth;

And we calculate hstart, hstop, and offset in sdw_compute_master_ports()

                t_data.hstart = hstart;
                t_data.hstop = hstop;
                t_data.block_offset = *port_bo;
                t_data.sub_block_offset = 0;
                (*port_bo) += bps * ch;

So that we allocate bit slots for each m_rt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's clearly wrong, the generic bit allocation should take the states into account, and skip the deprepared streams.

See #5177 for compile-tested-only proposal.

@@ -353,6 +391,10 @@ static int sdw_compute_bus_params(struct sdw_bus *bus)
clk_buf = NULL;
}

m_rt = list_last_entry(&bus->m_rt_list, struct sdw_master_runtime, bus_node);
s_rt = list_first_entry(&m_rt->slave_rt_list, struct sdw_slave_runtime, m_rt_node);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not following what this is supposed to do and why you are not dealing with M and S in symmetrical ways, the last/first isn't clear to me at all.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use the last m_rt is because that one is recently added to the bus and it is the m_rt that we need to handle. And for the s_rt, it doesn't really matter to use what s_rt in the m_rt. The s_rt will be used to find the peripheral lane, and we will check if the lane is connected to all peripherals.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's correct @bardliao. When there is a new stream, we need to compute the parameters for all streams and all _rt contexts. Think of it this way: at any bank switch the position of all bits could change. It could very well be that the new stream will have higher priority in the bandwidth allocation than the new one.

IOW, the generic allocation is NOT a purely additive mechanism. At every bank switch, the bus allocation is a blank slate and ALL streams/ports can be re-programmed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to list_for_each_entry(m_rt, &bus->m_rt_list, bus_node) to check all m_rt

@bardliao bardliao force-pushed the test-different-sdw-clock branch 5 times, most recently from 2297111 to e6977d3 Compare September 13, 2024 06:30
@bardliao bardliao marked this pull request as ready for review September 13, 2024 06:31
drivers/soundwire/stream.c Show resolved Hide resolved
@@ -380,6 +406,10 @@ static int sdw_compute_bus_params(struct sdw_bus *bus)
return -EINVAL;
}

/* Get the last m_rt that is recently added to the bus. */
m_rt = list_last_entry(&bus->m_rt_list, struct sdw_master_runtime, bus_node);
mstr_prop->default_col = curr_dr_freq / m_rt->stream->params.rate / mstr_prop->default_row;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh no no no, big confusion here!

The frame shape need to be computed with the SoundWire frame rate, which can be different to all the stream rates. Think e.g. of the example I presented with a 96 kHz frame rate to support two streams, one at 48kHz and one at 48 kHz.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, right, fixed now.

Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks mostly good, I still think the code can be made a bit clearer/compact, see suggestions below.

@@ -354,6 +375,10 @@ static int sdw_compute_bus_params(struct sdw_bus *bus)
clk_buf = NULL;
}

/* If dynamic scaling is supported, don't try higher freq */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logical inversion in comment, should be "NOT supported"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


/* Run loop for all groups to compute transport parameters */
for (i = 0; i < count; i++) {
port_bo = 1;
for (l = 0; l < SDW_MAX_LANES; l++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if we can optimize and only deal with existing lanes. 90% of the time there will be a single lane, it's a bit odd to loop on all possible lanes.

@bardliao bardliao force-pushed the test-different-sdw-clock branch 2 times, most recently from 58739b7 to ffc7d87 Compare September 16, 2024 09:44
@bardliao bardliao changed the title [RFC] soundwire: add higher bus clock rate support soundwire: add higher bus clock rate support Sep 16, 2024
bardliao and others added 4 commits September 16, 2024 20:19
The existing logic is problematic in that we deprepare all the ports,
but still take into account the stream for bit allocation by just
walking through the bus->m_rt list.

This patch sets the state earlier, so that such DEPREPARED streams can
be skipped in the bandwidth allocation (to be implemented in a
follow-up patch).

Signed-off-by: Pierre-Louis Bossart <[email protected]>
We should not blindly walk through all the m_rt list, since it will
have the side effect of accounting for deprepared streams.

This behavior is the result of the split implementation where the
dailink hw_free() handles the stream state change and the bit
allocation, and the dai hw_free() modifies the m_rt list. The bit
allocation ends-up using m_rt entries in zoombie state, not longer
relevant but still used.

Signed-off-by: Pierre-Louis Bossart <[email protected]>
Currently, we only set peripheral frequency when the peripheral is
initialized. However, curr_dr_freq may change to get required bandwidth.
For example, curr_dr_freq may increase from 4.8MHz to 9.6MHz when the
4th stream is opened. Add a helper to get the scale index so that we can
get the scale index and program it.

Signed-off-by: Bard Liao <[email protected]>
@bardliao
Copy link
Collaborator Author

SOFCI TEST

plbossart
plbossart previously approved these changes Sep 19, 2024
drivers/soundwire/stream.c Outdated Show resolved Hide resolved
We need to program bus clock scale to adjust the bus clock if current
bus clock doesn't fit the bandwidth.

Signed-off-by: Bard Liao <[email protected]>
We need to recalculate frame shape when sdw bus clock is changed.
And need to make sure all Peripherals connected to the Manager support
dynamic clock change.

Signed-off-by: Bard Liao <[email protected]>
…w_select_row_col

The bits in Column 0 of Rows 0 to 47 are for control word and can not be
used for audio. In practice, entire Column 0 is skipped.

Signed-off-by: Bard Liao <[email protected]>
Currently, we check curr_dr_freq roughly by "if (curr_dr_freq <=
bus->params.bandwidth)" in sdw_compute_bus_params() and check it
accurately in sdw_select_row_col(). It works if we only support one
freq. But, we need to check it accurately in sdw_select_row_col() to
give it a chance to use a higher freq or use multi-lane.

Signed-off-by: Bard Liao <[email protected]>
If a peripheral supports multi-lane, we can use data lane x to extend
the bandwidth. The patch suggests to select data lane x where x > 0
when bandwidth is not enough on data lane 0.

Signed-off-by: Bard Liao <[email protected]>
All active streams with the same parameters are grouped together and the
params are store in the sdw_group struct. We compute the required
bandwidth for each group. However, each lane has individual bandwidth.
Therefore, we should separate different lanes in different params groups.
Add lane variable to separate params groups.

Signed-off-by: Bard Liao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants