Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect caclutation for join case #387

Open
yanshuyuan opened this issue Jul 25, 2019 · 1 comment
Open

Incorrect caclutation for join case #387

yanshuyuan opened this issue Jul 25, 2019 · 1 comment

Comments

@yanshuyuan
Copy link

yanshuyuan commented Jul 25, 2019

First, I create the test data

CREATE TABLE ltp.sales ("user_id int,  product   varchar(100),  price     double);  
CREATE TABLE ltp.profile (user_id int, gender varchar(2), age int);

Then I mock some data for them

List<String> productList = Arrays.asList("milk", "egg", "juice");
		for (int i = 0; i < 300000; i++) {
			int randInt = ThreadLocalRandom.current().nextInt(0, 3);
			String product = productList.get(randInt);
			double price = (randInt + 2) * 10
					+ ThreadLocalRandom.current().nextInt(0, 10);
			stmt.execute(String
					.format("INSERT INTO ltp.sales (user_id, product, price) VALUES(%d, '%s', %.0f)",
							i, product, price));
		}

List<String> genderList = Arrays.asList("F", "M");
		for (int i = 0; i < 300000; i++) {
			int randInt = ThreadLocalRandom.current().nextInt(0,
					genderList.size());
			String gender = genderList.get(randInt);
			int age = ((randInt + 2) * 10
					+ ThreadLocalRandom.current().nextInt(0, 10)) % 5;
			stmt.execute(String
					.format("INSERT INTO ltp.profile (user_id, gender, age) VALUES(%d, '%s', %d)",
							i, gender, age));
		}

Now, wo create the scramble:

CREATE SCRAMBLE ltp.profile_uniform_scrambled from ltp.profile METHOD UNIFORM ON user_id RATIO 0.2 BLOCKSIZE 1000
CREATE SCRAMBLE ltp.profile_hash_scrambled from ltp.profile METHOD HASH ON user_id RATIO 0.2 BLOCKSIZE 1000;
CREATE SCRAMBLE ltp.sales_uniform_scrambled from ltp.sales METHOD UNIFORM ON user_id RATIO 0.2 BLOCKSIZE 1000;
CREATE SCRAMBLE ltp.sales_hash_scrambled from ltp.sales METHOD HASH ON user_id RATIO 0.2 BLOCKSIZE 1000;

I start to estimate:

SELECT 
	    product, 
	    SUM(price) as total_price
	FROM (
		SELECT
			product,
			user_id,
			SUM(price) as price
		FROM 
			ltp.sales
		GROUP BY
			product, 
			user_id
		HAVING
			SUM(price) > 10
	) t1 INNER JOIN (
		SELECT 
			user_id
		FROM 
			ltp.profile
		WHERE 
			age IN (1, 2, 3)
			AND 
			gender = 'F'
	) t2
	ON t1.user_id = t2.user_id
	GROUP BY product

the result gap is so large, that's why?

@pyongjoo
Copy link
Member

Sorry for the late reply. To support complex queries in a more principled way, we are working on a new system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants