Skip to content
This repository has been archived by the owner on Sep 27, 2019. It is now read-only.

Catalog code cleanup #1414

Merged
merged 10 commits into from
Jun 27, 2018
Merged

Catalog code cleanup #1414

merged 10 commits into from
Jun 27, 2018

Conversation

tli2
Copy link
Contributor

@tli2 tli2 commented Jun 18, 2018

Addresses issue #1398.

This PR fixes 1, 2 and renames CatalogObjects to Entries. Other minor code style fixes are included as well.

@coveralls
Copy link

coveralls commented Jun 18, 2018

Coverage Status

Coverage increased (+0.04%) to 77.899% when pulling 2b04223 on tli2:tianyu-catalog-cleanup into bf7ff62 on cmu-db:master.

apavlo
apavlo previously requested changes Jun 19, 2018
Copy link
Member

@apavlo apavlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. Changes are requested.

One potential problem with this PR is that we are switching catalog objects to be called 'entries', but then we are using object identifiers (oid_t) to reference them.

@@ -38,8 +38,10 @@ void BinderContext::AddRegularTable(const std::string db_name,
const std::string table_alias,
concurrency::TransactionContext *txn) {
// using catalog object to retrieve meta-data
auto table_object = catalog::Catalog::GetInstance()->GetTableObject(
db_name, schema_name, table_name, txn);
auto table_object = catalog::Catalog::GetInstance()->GetTableObject(txn,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably rename this as GetTableEntry too.

Copy link
Contributor Author

@tli2 tli2 Jun 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this and decided to leave this as is for the following reason. A "Table" is a CatalogEntry inside the TableCatalog, and we are getting a "Table" outside of it, which makes sense. Naming this GetTableObject instead of GetTable makes this clear that we are getting a bunch of information we have on the table (the table "object" in a system), and not the contents of the table itself. (This would make even more sense if we have a glossary somewhere explaining this and use it as a naming convention.) "GetTableEntry" has the same confusing double meaning, and "GetTableCatalogEntry" is not ambiguous but long and doesn't make much sense without looking at the return type name. So the naming here is fine, but the typename needs to renamed because the type "TableObject" wouldn't make sense on its own.

Let me know what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should vote on this. I think Get*CatalogEntry would be best.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote Get*CatalogEntry. I think the accuracy of function name is more important than length. And I think this name is not so long.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go with apavlo, ksaito7 and vote for ... Entry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine, I'll rename them to GetxxxCatalogEntry

Catalog::GetInstance()->CreateTable(
catalog_database_name, catalog_schema_name, catalog_table_name,
std::unique_ptr<catalog::Schema>(catalog_table_schema), txn, true);
Catalog::GetInstance()->CreateTable(txn,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless this function also creates the DataTable object (which it shouldn't), maybe we should rename this to CreateTableEntry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Sorry I missed this. I will rename this to TableObject or TableEntry depending on what you think makes sense for the above comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ironically, I think it does create the DataTable object (Not that it should)

storage::Database *pg_catalog = nullptr,
type::AbstractPool *pool = nullptr,
concurrency::TransactionContext *txn = nullptr);
static DatabaseCatalog *GetInstance(concurrency::TransactionContext *txn = nullptr,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the default values?

Copy link
Contributor Author

@tli2 tli2 Jun 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried but CLion's refactor feature really doesn't like our code and couldn't handle it. We can add another issue so somebody who has time to go through these by hand can do so in the future. (I will add this under #1398 later)


// Insert peloton database into pg_database
DatabaseCatalog::GetInstance()->InsertDatabase(
CATALOG_DATABASE_OID, CATALOG_DATABASE_NAME, pool_.get(), txn);
DatabaseCatalog::GetInstance(nullptr, nullptr, nullptr)->InsertDatabase(txn,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pass a null txn pointer to DatabaseCatalog::GetInstance() when you actually have the txn pointer?

Copy link
Contributor Author

@tli2 tli2 Jun 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, that CLion refactor thing I was talking about.

if (txn == nullptr)
throw CatalogException("Do not have transaction to drop schema " +
schema_name);

auto database_object =
DatabaseCatalog::GetInstance()->GetDatabaseObject(database_name, txn);
DatabaseCatalog::GetInstance(nullptr,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. You actually have the txn pointer.

const type::TypeId return_type, oid_t prolang, const std::string &func_src,
std::shared_ptr<peloton::codegen::CodeContext> code_context,
concurrency::TransactionContext *txn) {
void Catalog::AddPlpgsqlFunction(concurrency::TransactionContext *txn,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this should be called AddPlpgsqlFunction. You are passing in the prolang argument, so it should just be called AddFunction, right? Furthermore, we are are referring to UDFs as procedures (i.e., pg_proc table), so it really should be called AddProcedure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had no idea what this is supposed to do. Will change.

@@ -1188,14 +1411,16 @@ void Catalog::InitializeLanguages() {
auto &txn_manager = concurrency::TransactionManagerFactory::GetInstance();
auto txn = txn_manager.BeginTransaction();
// add "internal" language
if (!LanguageCatalog::GetInstance().InsertLanguage("internal", pool_.get(),
txn)) {
if (!LanguageCatalog::GetInstance().InsertLanguage(txn,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not your problem, but we should not be initializing the language table in this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make an issue for this so that we don't forget it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. #1421

@ksaito7 ksaito7 mentioned this pull request Jun 26, 2018
Copy link
Contributor

@pervazea pervazea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, let me say that this looks good. The changes in formatting help readability significantly.

Have added comments, but the things I feel could use some additional attention (mostly pre-existing):

  1. Exceptions vs. PELOTON_ASSERT. Haven't analyzed the code, but it looks to me as if there are quite a few locations where they should be asserts. We are trying to enforce an internal requirement, it isn't a recoverable run-time error.

  2. LOG_DEBUG. Unnecessary LOG_DEBUG where it should probably be LOG_TRACE. We should have a discussion about debug logging sometime, because the noise level, when one turns on tracing, makes it almost useless. To improve that situation, I think LOG_DEBUG should be used sparingly.

  3. Use of the Proc abbreviation in function / class names. I think this should be more explicitly Procedure. While the it is mostly clear that it is Procedure and not Process, we should just be explicit.

AbstractCatalog::GetResultWithSeqScan(
concurrency::TransactionContext *txn,
expression::AbstractExpression *predicate,
std::vector<oid_t> column_offsets) {
if (txn == nullptr) throw CatalogException("Scan table requires transaction");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CatalogException vs. PELOTON_ASSERT.
Should this be an assert? Is it ever legitimate to call this function without a transaction? If not, it should be an ASSERT.

oid_t index_offset,
std::vector<type::Value> scan_values,
std::vector<oid_t> update_columns,
std::vector<type::Value> update_values) {
if (txn == nullptr) throw CatalogException("Scan table requires transaction");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto for exception vs. PELOTON_ASSERT comment above.

concurrency::TransactionContext *txn) {
ResultType Catalog::CreateSchema(concurrency::TransactionContext *txn,
const std::string &database_name,
const std::string &schema_name) {
if (txn == nullptr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exception vs. PELOTON_ASSERT

index_name,
{column_id},
true,
IndexType::BWTREE);
LOG_DEBUG("Added a UNIQUE index on %s in %s.", col_name.c_str(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOG_DEBUG -> LOG_TRACE?

// Check if UDF already exists
auto proc_catalog_obj =
ProcCatalog::GetInstance().GetProcByName(name, argument_types, txn);
ProcCatalog::GetInstance().GetProcByName(txn, name, argument_types);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be clearer and more consistent to not use Proc. So rename to Procedure

ProcedureCatalog
GetProcedureByName
InsertProcedure
etc.
The local variables IMO can stay as is, the class names and class methods though, should change.

void SystemCatalogs::Bootstrap(const std::string &database_name,
concurrency::TransactionContext *txn) {
void SystemCatalogs::Bootstrap(concurrency::TransactionContext *txn,
const std::string &database_name) {
LOG_DEBUG("Bootstrapping database: %s", database_name.c_str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOG_TRACE?

LOG_DEBUG("Bootstrapping database: %s", database_name.c_str());

if (!pg_trigger_) {
pg_trigger_ = new TriggerCatalog(database_name, txn);
pg_trigger_ = new TriggerCatalog(txn, database_name);
}

// if (!pg_proc) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is dead code, remove?

IndexConstraintType index_constraint);

//===--------------------------------------------------------------------===//
// Members
//===--------------------------------------------------------------------===//

// Maximum column name size for catalog schemas
static const size_t max_name_size = 64;
static const size_t max_name_size_ = 64;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are going to change it ... lets replace with a more descriptive name. e.g. max_column_name_size_
max_name_size is very generic, could be any name.

std::unique_ptr<LanguageCatalogObject> GetLanguageByName(
const std::string &lang_name, concurrency::TransactionContext *txn) const;
std::unique_ptr<LanguageCatalogEntry> GetLanguageByName(concurrency::TransactionContext *txn,
const std::string &lang_name) const;

enum ColumnId {
OID = 0,
LANNAME = 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Poor abbreviation. Should be LANG or LANGUAGE

@@ -38,8 +38,10 @@ void BinderContext::AddRegularTable(const std::string db_name,
const std::string table_alias,
concurrency::TransactionContext *txn) {
// using catalog object to retrieve meta-data
auto table_object = catalog::Catalog::GetInstance()->GetTableObject(
db_name, schema_name, table_name, txn);
auto table_object = catalog::Catalog::GetInstance()->GetTableObject(txn,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go with apavlo, ksaito7 and vote for ... Entry.

Copy link
Contributor

@pervazea pervazea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per conversation with Tian Yu, naming changes done, exception/assert and logging added to follow on issue.

@tli2 tli2 added accepted and removed in progress labels Jun 27, 2018
@tli2 tli2 merged commit d22bd24 into cmu-db:master Jun 27, 2018
@tli2 tli2 deleted the tianyu-catalog-cleanup branch June 27, 2018 19:41
mtunique pushed a commit to mtunique/peloton that referenced this pull request Apr 16, 2019
* Catalog code cleanup

* Rename "XXXObject" to "CatalogEntry"

* Rename AddPlpgsqlFunction
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants