Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions about SCM #2

Open
rjy-fighting opened this issue Oct 28, 2022 · 6 comments
Open

Some questions about SCM #2

rjy-fighting opened this issue Oct 28, 2022 · 6 comments

Comments

@rjy-fighting
Copy link

Hello! I read your article carefully and was very interested in it! I have some questions as follows:

(1) Does the semantic similarity matrix E calculate the semantic similarity between all patchs?
(2) After I print E, I find a negative value. What does a negative value in E mean? (For example,the negative value -0.0383 in the first row)
tensor([[[ 1.0000, 0.3413, 0.3903, ..., 0.1250, -0.0383, 0.1996],
[ 0.3413, 1.0000, 0.4638, ..., 0.0055, 0.0692, 0.2095],
[ 0.3903, 0.4638, 1.0000, ..., 0.0800, -0.1332, 0.2198],
...,

(3) Does SCM diffuse only according to the semantic and spatial relations of the four points of its first-order neighbors?

Hope to get your reply! Thank you very much!

@rjy-fighting
Copy link
Author

I have another question, that is, is the evaluation metric GT-Known compared in the paper GT-Known top-1 or GT-Known top-5?

@rjy-fighting
Copy link
Author

May I know the environment in which the experiment was conducted?(For example,GPU)

@hbai98
Copy link
Owner

hbai98 commented Nov 16, 2022

HI! Sorry for the late reply.

(1) Yes, the E matrix in Eq.(3) is the normalized outer product of the whole vertex set $V$ and $V$ itself, thus $E_{i,j}$ denotes the arbitrary nodes' semantic similarity.

(2) For the arbitrary cosine distance $E_{i,j} = \frac{v_i^T v_j}{||v_i^T|||v_j^T||}$, as the the inner product $v_i^T v_j$ can be negative or positive.

Wikipedia's definition of cosine distance:
The resulting similarity ranges from −1, meaning exactly opposite, to 1, meaning precisely the same, with 0 indicating orthogonality or decorrelation,

So the negative values represent they are not alike.

(3) It's correct to some extent, as we illustrated in the suppl. materials, for simplicity, we only consider the first-order neighbors, meaning the four points. (You can experiment on the difference of connecting the second-order or more!)

The semantic and spatial relations have been leveraged by SCM to diffuse the raw attention to cover the complete objects. Notice the critical design is the semantic relations are constantly updated by different ADB layers, which is shown in Fig.6, and accordingly, the updated E will revise the later diffusion status.

I think the most intriguing thing is that diffusion can actually be done using one layer! I hypothesize that there should be some Reinforcement learning tricks to use one layer and receives the signal after each iteration step in Eq.(6) since we aim to find the intermediate status that the attention happens to capture the object.

@hbai98
Copy link
Owner

hbai98 commented Nov 16, 2022

I have another question, that is, is the evaluation metric GT-Known compared in the paper GT-Known top-1 or GT-Known top-5?

GT-Known normally uses the top-1. We give the top-k values only for convenience.

@hbai98
Copy link
Owner

hbai98 commented Nov 16, 2022

know

May I know the environment in which the experiment was conducted?(For example,GPU)

We use A100 with memory that is able to support batch-size 256, maybe 40 GB. I don't remember very clearly.
Let me know if there are further questions.

@rjy-fighting
Copy link
Author

I benefited a lot! Thank you for your reply!

know

May I know the environment in which the experiment was conducted?(For example,GPU)

We use A100 with memory that is able to support batch-size 256, maybe 40 GB. I don't remember very clearly. Let me know if there are further questions.

I benefited a lot! Thank you for your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants