Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: refactor tables to reuse common fields #36

Merged
merged 6 commits into from
Sep 5, 2024
Merged

Conversation

adklempner
Copy link
Collaborator

@adklempner adklempner commented Aug 7, 2024

Depends on: #44

  • Adds a device type field for all metrics
  • Creates a new table for fields that are common across metrics
  • Removes fields from above in existing tables, adds a foreign key to above
  • Default to SQL time for createdAt
  • Updates types and put/process functions to work with new table and foreign key

@adklempner adklempner marked this pull request as ready for review August 7, 2024 04:23

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, is it possible to introduce some RecordID to each record stored in separate tables, and then introduce a new table that will hold all common fields? And just LEFT JOIN the tables on read.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously, it's a big change, and not required for this PR.
Yet, if we keep adding more fields, would be nice to consider.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, is it possible to introduce some RecordID to each record stored in separate tables, and then introduce a new table that will hold all common fields? And just LEFT JOIN the tables on read.

Would this mean we have to write to two tables each time we receive data?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I don't think that's a problem. Is it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in latest commit

Copy link

@igor-sirotin igor-sirotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvements!
I don't have any strong objections, mostly suggestions.
Also, I would recommend to keep this PR only about adding deviceType, and open a separate PR for the commonFields. It would help to to block one change with the other one. And the review would be simpler for the folks :)

}

type ErrorSendingEnvelope struct {
CreatedAt int64 `json:"createdAt"`
Id int `json:"id"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Id int `json:"id"`
ID int `json:"id"`

"github.com/status-im/telemetry/pkg/types"
)

func InsertCommonFields(db *sql.DB, data types.CommonFieldsAccessor) (int, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just pass pointer to CommonFields here? I think using interface here is a bit overcomplicating.

Suggested change
func InsertCommonFields(db *sql.DB, data types.CommonFieldsAccessor) (int, error) {
func InsertCommonFields(db *sql.DB, data *types.CommonFields) (int, error) {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I didn't know it could work like this! much simpler, thanks

Comment on lines 1 to 6
ALTER TABLE peercount ADD COLUMN deviceType VARCHAR(255);
ALTER TABLE receivedMessages ADD COLUMN deviceType VARCHAR(255);
ALTER TABLE receivedEnvelopes ADD COLUMN deviceType VARCHAR(255);
ALTER TABLE sentEnvelopes ADD COLUMN deviceType VARCHAR(255);
ALTER TABLE errorSendingEnvelope ADD COLUMN deviceType VARCHAR(255);
ALTER TABLE peerConnFailure ADD COLUMN deviceType VARCHAR(255);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need to add deviceType to all tables?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping it in as it's included in separate PR that will be merged first #44

BEGIN

-- Add the new column for the foreign key
EXECUTE format('ALTER TABLE %I ADD COLUMN IF NOT EXISTS commonFieldsId INTEGER', table_name);
Copy link

@igor-sirotin igor-sirotin Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not pushing on terms here, just sharing opniong.

I would call this column just id. And the commonFields, would simply be TelemetryRecord. And then there're difference extensions of these records that reference the main record by ID.

Suggested change
EXECUTE format('ALTER TABLE %I ADD COLUMN IF NOT EXISTS commonFieldsId INTEGER', table_name);
EXECUTE format('ALTER TABLE %I ADD COLUMN IF NOT EXISTS id INTEGER', table_name);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean multiple rows in other tables sharing the same TelemetryRecord? e.g. a row in receivedMessages and a row in peercount with the same value in column id?

I think this would work but would require removing the timestamp from TelemetryRecord and putting it back into the other tables.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, never mind. After second thought on this, I decided to google how to do it right 😄
What we're doing here is called Class Table Inheritance.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my suggestion would be just to rename CommonFields to TelemetryRecord (or simply Record).
And consequently rename commonFieldsId -> record_id.

But this is not just my subjective opinion, leaving this at your decision ofc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed in latest commit

RAISE NOTICE 'Foreign key constraint already exists on %', table_name;
END;

-- Remove columns that are now in commonFields

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance to migrate current telemetry to the new table as well?

err := db.QueryRow(`
SELECT COUNT(*)
FROM receivedMessages rm
JOIN commonFields cf ON rm.id = cf.id

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it should be rm.commonFieldsId

err = db.QueryRow(`
SELECT COUNT(*)
FROM receivedMessages rm
JOIN commonFields cf ON rm.id = cf.id

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it should be rm.commonFieldsId

@@ -65,7 +67,7 @@ func (r *ReceivedMessage) put(db *sql.DB) error {
}

func queryReceivedMessagesBetween(db *sql.DB, startsAt time.Time, endsAt time.Time) ([]*types.ReceivedMessage, error) {
rows, err := db.Query(fmt.Sprintf("SELECT id, chatId, messageHash, messageId, receiverKeyUID, peerId, nodeName, sentAt, topic, messageType, messageSize, createdAt, pubSubTopic FROM receivedMessages WHERE sentAt BETWEEN %d and %d", startsAt.Unix(), endsAt.Unix()))
rows, err := db.Query(fmt.Sprintf("SELECT id, chatId, messageHash, messageId, receiverKeyUID, sentAt, topic, messageType, messageSize, pubSubTopic FROM receivedMessages WHERE sentAt BETWEEN %d and %d", startsAt.Unix(), endsAt.Unix()))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we LEFT JOIN commonFields here? 🤔

@igor-sirotin
Copy link

@richard-ramos please take a look as well

Comment on lines 31 to 52
func (c CommonFields) GetNodeName() string {
return c.NodeName
}

func (c CommonFields) GetPeerID() string {
return c.PeerID
}

func (c CommonFields) GetStatusVersion() string {
return c.StatusVersion
}

func (c CommonFields) GetDeviceType() string {
return c.DeviceType
}

type CommonFieldsAccessor interface {
GetNodeName() string
GetPeerID() string
GetStatusVersion() string
GetDeviceType() string
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having an interface for accesors might be an overkill.
Generally we'd use interfaces for common behavior, decoupling or restricting behavior, but looking at how it's being used in the code, you should be able to access the fields, (i.e. if you have a PeerCount, by calling myPeerCount.CommonFields). An argument could be made for decoupling the database code from the public facing structs, but, unless there's an intention/need to decouple these, we'll end up adding more complexity than required

ID int `json:"id"`
MessageHash string `json:"messageHash"`
SentAt int64 `json:"sentAt"`
CreatedAt int64 `json:"createdAt"`
PubsubTopic string `json:"pubsubTopic"`
Topic string `json:"topic"`
SenderKeyUID string `json:"senderKeyUID"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, should we use json:"senderKeyUID" or json:"senderKeyUid" . I see diff casing for uid in the json naming (like for example on line 65 and 96)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

status-go uses UID for all json so I set it to that

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess now it's too late to change it anyway, it would be a breaking change

_, err = db.Exec("DROP TABLE IF EXISTS protocolStatsTotals")
if err != nil {
log.Fatalf("an error '%s' was not expected when dropping the table", err)
tables := []string{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this!

log.Fatalf("an error '%s' was not expected when dropping the table", err)
}
for _, table := range tables {
_, err := db.Exec(fmt.Sprintf("DROP TABLE IF EXISTS %s", table))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's possible to do this:

Suggested change
_, err := db.Exec(fmt.Sprintf("DROP TABLE IF EXISTS %s", table))
_, err := db.Exec("DROP TABLE IF EXISTS $1", table)

Copy link
Collaborator Author

@adklempner adklempner Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fun fact, turns out it only works with:

db.Exec(`DROP TABLE IF EXISTS $1`, table)

edit: nevermind, that doesn't work either. per gpt "In PostgreSQL, you can't use placeholders for table names or other identifiers."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn, that's annoying :(

Comment on lines 21 to 34
commonFieldsId, err := InsertCommonFields(db, r.data)
if err != nil {
return err
}

defer stmt.Close()
peerCountStmt, err := db.Prepare(`
INSERT INTO peerCount (common_fields_id, nodeKeyUid, peerCount)
VALUES ($1, $2, $3)
RETURNING id;
`)
if err != nil {
return err
}
defer peerCountStmt.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since both rows being inserte here are dependent on each other, we should use a transaction which should be passed to InsertCommonFields, and used in line 26 too (and maybe in line 37? not sure), with it's commit/rollback depending if there is an error.

@@ -56,24 +58,24 @@ func (r *PeerConnFailure) process(db *sql.DB, errs *MetricErrors, data *types.Te
return err
}

stmt, err := db.Prepare("INSERT INTO peerConnFailure (timestamp, nodeName, nodeKeyUid, peerId, failedPeerId, failureCount, statusVersion, createdAt) VALUES ($1, $2, $3, $4, $5, $6, $7, $8) RETURNING id;")
commonFieldsId, err := InsertCommonFields(db, r.data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment about using a trx

topic, receiverKeyUID, nodeName, processingError, statusVersion)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)

commonFieldsId, err := InsertCommonFields(db, r.data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment about using a trx

topic, senderKeyUID, peerId, nodeName, publishMethod, statusVersion)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)

commonFieldsId, err := InsertCommonFields(db, r.data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment about using a trx

stmt, err := db.Prepare(`INSERT INTO errorSendingEnvelope (messageHash, sentAt, createdAt, pubsubTopic,
topic, senderKeyUID, peerId, nodeName, publishMethod, statusVersion, error)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)
commonFieldsId, err := InsertCommonFields(db, e.data.SentEnvelope)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment about using a trx

@@ -35,27 +35,29 @@ func (r *ReceivedMessage) process(db *sql.DB, errs *MetricErrors, data *types.Te
}

func (r *ReceivedMessage) put(db *sql.DB) error {
stmt, err := db.Prepare("INSERT INTO receivedMessages (chatId, messageHash, messageId, receiverKeyUID, peerId, nodeName, sentAt, topic, messageType, messageSize, createdAt, pubSubTopic, statusVersion) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13) RETURNING id;")
commonFieldsId, err := InsertCommonFields(db, r.data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment about using a transaction

@adklempner adklempner force-pushed the feat/device-type branch 2 times, most recently from 193ea15 to cdaa927 Compare August 16, 2024 06:23
return err
}

stmt, err := db.Prepare("INSERT INTO peerCount (timestamp, nodeName, nodeKeyUid, peerId, peerCount, statusVersion, createdAt) VALUES ($1, $2, $3, $4, $5, $6, $7) RETURNING id;")
tx, err := db.BeginTx(context.Background(), nil)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but it's better to actually pass some real context

@adklempner adklempner changed the title feat: add device type to all metrics feat: refactor tables to reuse common fields Aug 30, 2024
Copy link
Member

@richard-ramos richard-ramos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just left few nitpicks about things that probably werent introduced in this PR and might be worked on later!

@@ -111,6 +89,11 @@ func DropTables(db *sql.DB) {
log.Fatalf("an error '%s' was not expected when dropping the table", err)
}

_, err = tx.Exec("DROP TABLE IF EXISTS telemetryRecord")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this table be included in the list on line 31?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be dropped after all other tables and indexes

Comment on lines 10 to 14
stmt, err := tx.Prepare(`
INSERT INTO telemetryRecord (nodeName, peerId, statusVersion, deviceType)
VALUES ($1, $2, $3, $4)
RETURNING id;
`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As described in #43 consider using exec, or if this insertion is going to be called frequently, maybe it's a good idea to extract the prepared statement and make it available during the entire duration of the program.

return err
}

stmt, err := db.Prepare("INSERT INTO peerCount (timestamp, nodeName, nodeKeyUid, peerId, peerCount, statusVersion, createdAt, deviceType) VALUES ($1, $2, $3, $4, $5, $6, $7, $8) RETURNING id;")
tx, err := db.BeginTx(context.Background(), nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the context should be passed down. Maybe this one could be used https://github.com/status-im/telemetry/blob/master/telemetry/server.go#L39 ?

Comment on lines 34 to 38
peerCountStmt, err := tx.Prepare(`
INSERT INTO peerCount (common_fields_id, nodeKeyUid, peerCount)
VALUES ($1, $2, $3)
RETURNING id;
`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using exec vs prepare #43

return err
}

tx, err := db.BeginTx(context.Background(), nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the context should be passed down. Maybe this one could be used https://github.com/status-im/telemetry/blob/master/telemetry/server.go#L39 ?

return err
}

stmt, err := db.Prepare("INSERT INTO peerConnFailure (timestamp, nodeName, nodeKeyUid, peerId, failedPeerId, failureCount, statusVersion, createdAt, deviceType) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9) RETURNING id;")
stmt, err := db.Prepare("INSERT INTO peerConnFailure (common_fields_id, nodeKeyUid, failedPeerId, failureCount) VALUES ($1, $2, $3, $4) RETURNING id;")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See prev comment about using exec.

return err
}

stmt, err := tx.Prepare(`INSERT INTO receivedEnvelopes (recordId, messageHash, sentAt, pubsubTopic,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See prev comment about using exec.

return err
}

stmt, err := tx.Prepare(`INSERT INTO sentEnvelopes (recordId, messageHash, sentAt, pubsubTopic,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See prev comment about using exec.

stmt, err := db.Prepare(`INSERT INTO errorSendingEnvelope (messageHash, sentAt, createdAt, pubsubTopic,
topic, senderKeyUID, peerId, nodeName, publishMethod, statusVersion, error, deviceType)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12)
tx, err := db.BeginTx(context.Background(), nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment about passing down context

@@ -40,39 +41,53 @@ func (r *ReceivedMessage) Clean(db *sql.DB, before int64) (int64, error) {
}

func (r *ReceivedMessage) Put(db *sql.DB) error {
stmt, err := db.Prepare("INSERT INTO receivedMessages (chatId, messageHash, messageId, receiverKeyUID, peerId, nodeName, sentAt, topic, messageType, messageSize, createdAt, pubSubTopic, statusVersion, deviceType) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14) RETURNING id;")
tx, err := db.BeginTx(context.Background(), nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See prev comment about passing down the context

@adklempner
Copy link
Collaborator Author

LGTM. Just left few nitpicks about things that probably werent introduced in this PR and might be worked on later!

Replaced all Prepare with QueryRow (not Exec because Postgres driver doesn't support returning any values this way, and we want the ID) and updated to use passed down context. The latter was really easy to do thanks to the refactoring by @vpavlin !

@adklempner adklempner merged commit e14f337 into master Sep 5, 2024
3 checks passed
@adklempner adklempner deleted the feat/device-type branch September 5, 2024 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants