-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix/466 standardization support for second fractions #679
Bugfix/466 standardization support for second fractions #679
Conversation
* Class Section to define parts of the pattern string and string to convert
* Implicit classes additional logic to enable String and Columns to operate with Section class
* Actual implementation of the support for second fractions * S, i and n placeholders in pattern * EnceladusDateTimeParser support for the new placeholders * Standardization (TypeParser) support for the new placeholders
* Fixing and enhancing UTs for the new second fractions placeholders
* Validators to support the new second fractions placeholders
* README.md update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I overestimated myself, I am going to rest and finish this tomorrow. Sorry for splitting it.
README.md
Outdated
| _epoch_ | Seconds since 1970/01/01 00:00:00 | 1557136493| | ||
| _milliepoch_ | Milliseconds since 1970/01/01 00:00:00.0000| 15571364938124 | | ||
| _epoch_ | Seconds since 1970/01/01 00:00:00 | 1557136493, 1557136493.136| | ||
| _epochmilli_ | Milliseconds since 1970/01/01 00:00:00.0000| 15571364938124, 15571364938124.001 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does milli have 4 more places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo
|
||
**NB!** Spark uses US Locale and because on-the-fly conversion would be complicated, at the moment we stick to this | ||
hardcoded locale as well. E.g. `am/pm` for `a` placeholder, English names of days and months etc. | ||
|
||
**NB!** The keywords are case **insensitive**. Therefore, there is no difference between `epoch` and `EpoCH`. | ||
|
||
**NB!** While _nanoseconds_ designation is supported on input, it's not supported in storage or further usage. So any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put some *
or something to the format or something, that would lead people to this point.
val interim: Column = to_timestamp(stringColumn, pattern) | ||
pattern.defaultTimeZone.map(to_utc_timestamp(interim, _)).getOrElse(interim) | ||
if (pattern.containsSecondFractions) { | ||
//this is a trick how to enforce fractions of seconds into teh timestamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//this is a trick how to enforce fractions of seconds into teh timestamp | |
//this is a trick how to enforce fractions of seconds into the timestamp |
utils/src/main/scala/za/co/absa/enceladus/utils/general/Section.scala
Outdated
Show resolved
Hide resolved
|
||
object ColumnImplicits { | ||
implicit class ColumnEnhancements(column: Column) { | ||
def isInfinite: Column = { | ||
column.isin(Double.PositiveInfinity, Double.NegativeInfinity) | ||
} | ||
|
||
def zeroBasedSubstr(startPos: Int): Column = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you add a basic doc for public members?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, forgot to put them here.
I separated the logical chain of dependencies into separate commits, in hope it would help understanding this PR. The basic logic of the solution is, that the the pattern is checked for the presence of milli-, micro- and/or nanoseconds. If not found, nothing special is needed.
Btw, |
@@ -22,6 +22,21 @@ import scala.annotation.tailrec | |||
object StringImplicits { | |||
implicit class StringEnhancements(string: String) { | |||
|
|||
def injectWith(what: String, where: Int): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add docs here as well and explanation if the string is added on the position where or after it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I wonder if to keep it though. I created the methods, but turned out, I don't need it. Wondered if its useful enough to keep. Opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're not using it, I'd say remove it and consider putting it in commons
instead.
private def makePreciseTimestamp(seconds: Long, nanoseconds: Int): Timestamp = { | ||
val result = new Timestamp(seconds * MillisecondsInSecond) | ||
if (nanoseconds > 0) { | ||
result.setNanos(nanoseconds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result.setNanos(nanoseconds) | |
result.setNanos(nanoseconds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to need a tutorial on this one lol
...rc/test/scala/za/co/absa/enceladus/utils/validation/field/FieldValidatorTimestampSuite.scala
Outdated
Show resolved
Hide resolved
...rc/test/scala/za/co/absa/enceladus/utils/validation/field/FieldValidatorTimestampSuite.scala
Show resolved
Hide resolved
utils/src/test/scala/za/co/absa/enceladus/utils/validation/field/FieldValidatorDateSuite.scala
Outdated
Show resolved
Hide resolved
utils/src/test/scala/za/co/absa/enceladus/utils/general/SectionSuite.scala
Outdated
Show resolved
Hide resolved
utils/src/main/scala/za/co/absa/enceladus/utils/general/Section.scala
Outdated
Show resolved
Hide resolved
utils/src/main/scala/za/co/absa/enceladus/utils/general/Section.scala
Outdated
Show resolved
Hide resolved
utils/src/main/scala/za/co/absa/enceladus/utils/general/Section.scala
Outdated
Show resolved
Hide resolved
utils/src/main/scala/za/co/absa/enceladus/utils/general/Section.scala
Outdated
Show resolved
Hide resolved
…mments * Removed `injectWith` method from `StringImplicits` as it's not used * `EnceladusDateTimeParser.parseDate` now trims time of the date information * Improved some existing added missing classes and methods documentation * Better UTs * README.md better wording * Code reordering * Typos
| _milliepoch_ | Milliseconds since 1970/01/01 00:00:00.0000| 15571364938124 | | ||
| placeholder | Description | Example | Note | | ||
| --- | --- | --- | --- | | ||
| G | Era designator | AD | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that we have such a good description of pattern characters. I had to look up for this every time I needed to write a pattern. Now README is the place to look. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments as discussed with @benedeki, @yruslan, @GeorgiChochov
} | ||
|
||
/** | ||
* Metrics defined on Section |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Number of positions between two sections
(start min forString.length, start + length) | ||
} else { | ||
val startIndex = forString.length + start | ||
if (startIndex >= 0) { // |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (startIndex >= 0) { // | |
if (startIndex >= 0) { |
(0, length + startIndex max 0) | ||
} | ||
} | ||
(realStart, after min forString.length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change lines 72
, 75
from infix notation to prefix notation
* @param fromString the string to apply the section to | ||
* @return substring defined by this section | ||
*/ | ||
def extract(fromString: String): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def extract(fromString: String): String = { | |
def extractFrom(string: String): String = { |
Do a similar change the rest of methods as consulted at the meeting
* @return None - if one Section has a negative start and the other positive or zero | ||
* The end of the smaller section subtracted from the start of the greater one (see comparison), e.g. can be negative | ||
*/ | ||
def distance(that: Section): Option[Int] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def distance(that: Section): Option[Int] = { | |
def distance(secondSection: Section): Option[Int] = { |
* The end of the smaller section subtracted from the start of the greater one (see comparison), e.g. can be negative | ||
*/ | ||
def distance(that: Section): Option[Int] = { | ||
def subtractLike(from: Section, what: Section): Int = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def subtractLike(from: Section, what: Section): Int = { | |
def calculateDistance(first: Section, second: Section): Int = { |
} | ||
|
||
object Section { | ||
def fromIndexes(start: Int, end: Int): Section = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add comments to public methods
|
||
object Section { | ||
def fromIndexes(start: Int, end: Int): Section = { | ||
Section(start, end - start + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a check that they are in the right order
Section(start, end - start + 1) | ||
} | ||
|
||
def ofSameChars(fromString: String, fromIndex: Int): Section = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def ofSameChars(fromString: String, fromIndex: Int): Section = { | |
def ofSameChars(inputString: String, signedIndex: Int): Section = { |
} | ||
|
||
def ofSameChars(fromString: String, fromIndex: Int): Section = { | ||
val realFromIndex = if (fromIndex >= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val realFromIndex = if (fromIndex >= 0) { | |
val index = if (fromIndex >= 0) { |
…mments * Section.fromIndexes is now agnostic to input parameters order * Using rather Math.min/max instead of infix operators * Better method names for Section class * Better method parameter names for Section class * Better test descriptions for SectionSuite class
assert(result == sections) | ||
} | ||
|
||
test("mergeSections: touching") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe illustrate an example of to with the explanation why for the given input we get a particular output?
For instance,
For a string:
1234567890ACDFEFGHIJKLMNOPQUSTUVWXYZ
^ ^ ^^
| | ^^ |
| | Section(5,1) Section(-4,2)
| Section(3,2)
Section(1,1)
Output of the merge:
...
etc.
That is so we won't forget your explanation.
|
||
val secondFractionsSections: Seq[Section] | ||
val patternWithoutSecondFractions: String | ||
def containsSecondFractions: Boolean = secondFractionsSections.nonEmpty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is not changing value and si called a few times, it could be turned into lazy val
instead of def.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lazy
having bigger overhead than this simple expression, which might very likely be even inligned by the compiler.
} | ||
|
||
/** | ||
* Inverse function for `remove`, inserts the `what` string into the `into` string as defined by the `section` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the length of the string have to be the same as the length of the section?
Add constraints description to the docs please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, good idea 👍
} else { | ||
val sorted = input.sorted | ||
sorted.tail.foldLeft(List(sorted.head)) { (resultAcc, item) => | ||
if (item touches resultAcc.head) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please change from infix notation to prefix notation
if (item touches resultAcc.head) { | |
if (item.touches(resultAcc.head)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to admit I really like the inflix here. Makes it easy to read, no issues with precedence. You mind it much? 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind it, I would just keep everything the "same".
* @param sections the sections to merge | ||
* @return an ordered from greater to smaller sequence of distinct sections (their distance is at least 1 or undefined) | ||
*/ | ||
def mergeSections(sections: Seq[Section]): Seq[Section] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def mergeSections(sections: Seq[Section]): Seq[Section] = { | |
def mergeTouchingSections(sections: Seq[Section]): Seq[Section] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, sorry the name comes off as misleading IMO.
And sequence of Sections can be provided. Not just "touching" ones, distinct, overlapping, one included in other, same ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge sections seems like you are going to merge all of them and allow gaps in between them. Let's think of something then.
…mments * Removed exception throwing from the code * Function renamed to better express purpose -> mergeSections->mergeTouchingSectionsAndSort * Further improvement on comments
Wanted to change the microsecond placeholder to u (widely used shorthand for micro prefix). Unfortunately turns out, u is already being used for day of the week. But it made me spot the bug #707 |
Section
to define parts of the pattern string and string to convertSection
classEnceladusDateTimeParser
support for the new placeholdersTypeParser
) support for the new placeholdersREADME.md
update