Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x axis wrong? #28

Open
tejelt opened this issue Jun 26, 2019 · 8 comments
Open

x axis wrong? #28

tejelt opened this issue Jun 26, 2019 · 8 comments
Assignees
Labels
Milestone

Comments

@tejelt
Copy link

tejelt commented Jun 26, 2019

In the plot that Lena made on the plane, the lambda values of at least a couple of clusters seem to be wrong. Most noticeable are two clusters with Tx>10 which appear in the plot with lambda ~ 35, but in the catalog actually have lambda ~ 55.
Noner2500_temperature-lambda.pdf

@sweverett sweverett self-assigned this Jun 26, 2019
@sweverett sweverett added the bug label Jun 26, 2019
@sweverett sweverett added this to the Someday milestone Jun 26, 2019
@sweverett
Copy link
Owner

@jjobel @paigemkelly now that we are moving on to the plotting functions, this is something to keep in mind. I haven't double checked yet, but I assume that this has to do with an issue with the pivot. For example, here's what plotting the simple output of the scaled data looks like:
image
These lambdas don't make sense because it's really plotting ln(lambda) - pivot(x) = ln(lambda) - median(ln(lambda)). For the final plots I reverse this scaling in plotlib.py, but there may be a bug that does this incorrectly. Something to keep an eye out for.

@sweverett
Copy link
Owner

Actually, just looking at that it seems wrong. Isn't the pivot supposed to be the median of lambda, not the median of the scaled lambda @tejelt ?

@sweverett
Copy link
Owner

Ok now I'm unsure again. In my mind, the idea of the pivot is to choose the "center" of the data that you are fitting to minimize correlation between your fitted slope & intercept. In that case, it would seem that the choice of median of the scaled lambda is correct.

@tejelt
Copy link
Author

tejelt commented Sep 27, 2020

What do you mean by scaled lambda? You want ln(lambda/pivot) where pivot is the median lambda. The ln(median(lambda)) should be (roughly) the same as median(ln(lambda). The plot axes are definitely weird above. Also not sure of the y-axis. This must be scaled by something.

@sweverett
Copy link
Owner

sweverett commented Sep 27, 2020

Ignore the y, this was just a test plot the rewrite branch made to make sure it was running. Don't take the numbers seriously.

I went through the math and think I've discovered my confusion. In my mind, the whole point of a pivot is to shift the data distribution to the center to minimize the correlation on slope & intercept. So I visualized it like this:
y = m * (x' - x_0) + b
where x_0 is the pivot. With this definition, x_0 = pivot = med(x) = med(ln(lambda)), where x' is the lambda in the scaled space (what I meant by "scaled lambda"). However, I could not get this to work consistently with ln(lambda/pivot). It looks like what people instead do is the following:
x_0 = med(ln(lambda)) = ln(lambda_p)=ln(pivot)
where lambda_p is the lambda corresponding to the median in the scaled ln space, x'. So I think this all came down to me thinking that x_0 was the pivot (which is normally the convention when you're just dealing with a linear fit outside of any transformations), whereas here it is lambda_p.
Does any of that make sense?

@sweverett
Copy link
Owner

sweverett commented Sep 27, 2020

Here's a shorter version of my argument, starting from the usual definition:

L = a * (lambda / pivot) ^ b
ln(L) = ln(a) + b*[ln(lambda) - ln(pivot)]
y = intercept + slope*(x' - x_0)
y = intercept + slope*x

Thus x_0 is not the pivot referenced by the usual equation, and so the pivot that clustr.py computes:

# Log-x before pivot
 xlog = np.log(data.x)

# Set pivot
if piv_type == 'median':
    piv = np.median(xlog)

# Scale log_x by pivot
log_x = xlog - piv

is inconsistent with the usual equation.

@sweverett
Copy link
Owner

sweverett commented Sep 27, 2020

Now this difference in definition may not actually matter. Here is the unscale() function in the plotting code, which takes the data in the fitted (x,y) space to the original (lambda, L) space:

def unscale(x, y, x_err, y_err, x_piv):
    ''' Recover original data from fit-scaled data '''
    return (np.exp(x + x_piv), np.exp(y), x_err * x, y_err * y)

This transformation is completely consistent with my definition of x_piv from above, as:

ln(L) = ln(a) + b*[ln(lambda) - ln(lambda_p)]
-> y = intercept + slope * [ln(lambda) - pivot]
-> y = intercept + slope * x
-> lambda = e ^ (x + pivot)

So I don't see how the pivot would cause an incorrect lambda in the plots. But we can easily double check this with some tests.

@sweverett
Copy link
Owner

The plot_scatter function takes the data from the loaded catalog and plots it directly with errorbar(), it doesn't even interact with any of the scaling or unscaling functions. So any bug in displayed lambda values would come from the catalog reading itself, which I find to be much less likely.

The place I was worried about this was if I was displaying lambda / lambda_piv incorrectly on the plots by using x_piv. However, it looks like I was lazy and just had it print out x / x_piv:
image
Thus, so far I don't see any bugs or inconsistencies other than vocabulary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants