Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WJ-1402] Revamp Wikicomma import script #1980

Merged
merged 133 commits into from
Jul 11, 2024
Merged

[WJ-1402] Revamp Wikicomma import script #1980

merged 133 commits into from
Jul 11, 2024

Commits on Jun 28, 2024

  1. Configuration menu
    Copy the full SHA
    bc8571e View commit details
    Browse the repository at this point in the history
  2. Start new importer module.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    5534270 View commit details
    Browse the repository at this point in the history
  3. Start s3 methods.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    911afe6 View commit details
    Browse the repository at this point in the history
  4. Run black formatter.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    19dbb72 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    56b8b16 View commit details
    Browse the repository at this point in the history
  6. Start process methods.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    b62c732 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    1238640 View commit details
    Browse the repository at this point in the history
  8. Add user ingest method.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    8d6c915 View commit details
    Browse the repository at this point in the history
  9. Update user ingestion code.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    6f56464 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    07f3ea6 View commit details
    Browse the repository at this point in the history
  11. Add site data ingestion.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    85e40e9 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    d269cf6 View commit details
    Browse the repository at this point in the history
  13. Start site subdirectories.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    6b10458 View commit details
    Browse the repository at this point in the history
  14. Add process_pages() stub.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    005897d View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    1cab879 View commit details
    Browse the repository at this point in the history
  16. Add page data.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    53f8286 View commit details
    Browse the repository at this point in the history
  17. Change logging.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    a6f9b63 View commit details
    Browse the repository at this point in the history
  18. Fix regex execution.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    4470851 View commit details
    Browse the repository at this point in the history
  19. Fix init.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    3f14825 View commit details
    Browse the repository at this point in the history
  20. Run black formatter.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    846bdae View commit details
    Browse the repository at this point in the history
  21. Fix add_site().

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    6f43459 View commit details
    Browse the repository at this point in the history
  22. Fix decorators.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    4615453 View commit details
    Browse the repository at this point in the history
  23. Add missing import.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    52c2053 View commit details
    Browse the repository at this point in the history
  24. Add another missing import.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    afea615 View commit details
    Browse the repository at this point in the history
  25. Fix format string.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    287a9b9 View commit details
    Browse the repository at this point in the history
  26. Configuration menu
    Copy the full SHA
    93e73fc View commit details
    Browse the repository at this point in the history
  27. Skip torrent files.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    8b36ed2 View commit details
    Browse the repository at this point in the history
  28. Configuration menu
    Copy the full SHA
    caca641 View commit details
    Browse the repository at this point in the history
  29. Fetch site ID from database if present.

    Avoid web downloads if already done.
    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    6935462 View commit details
    Browse the repository at this point in the history
  30. Add site to page log.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    1259a04 View commit details
    Browse the repository at this point in the history
  31. Configuration menu
    Copy the full SHA
    a37a740 View commit details
    Browse the repository at this point in the history
  32. Configuration menu
    Copy the full SHA
    6f38f81 View commit details
    Browse the repository at this point in the history
  33. Fix typo.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    ba99a8c View commit details
    Browse the repository at this point in the history
  34. Handle missing tag list.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    849d032 View commit details
    Browse the repository at this point in the history
  35. Properly convert page slug.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    dd7a11a View commit details
    Browse the repository at this point in the history
  36. Fix insert query.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    02f72b9 View commit details
    Browse the repository at this point in the history
  37. Fix page metadata variables.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    46c8b4f View commit details
    Browse the repository at this point in the history
  38. Configuration menu
    Copy the full SHA
    d5050fc View commit details
    Browse the repository at this point in the history
  39. Add text table.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    7602cf8 View commit details
    Browse the repository at this point in the history
  40. Add wikitext storage to page revisions.

    Separate table to enable easier intermediate processing.
    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    fdf9e1b View commit details
    Browse the repository at this point in the history
  41. Add quotes.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    d5c7a08 View commit details
    Browse the repository at this point in the history
  42. Configuration menu
    Copy the full SHA
    24acb5e View commit details
    Browse the repository at this point in the history
  43. Change to properties.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    b53dd1e View commit details
    Browse the repository at this point in the history
  44. Fix queries.

    emmiegit committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    ed23cb9 View commit details
    Browse the repository at this point in the history
  45. Configuration menu
    Copy the full SHA
    e1af90f View commit details
    Browse the repository at this point in the history
  46. Configuration menu
    Copy the full SHA
    9a65026 View commit details
    Browse the repository at this point in the history

Commits on Jun 29, 2024

  1. Configuration menu
    Copy the full SHA
    f7b9601 View commit details
    Browse the repository at this point in the history
  2. Update get_page_id() method.

    emmiegit committed Jun 29, 2024
    Configuration menu
    Copy the full SHA
    f465dda View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    3a6fe29 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    f57ab84 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    8953cb7 View commit details
    Browse the repository at this point in the history
  6. Fix helper method.

    emmiegit committed Jun 29, 2024
    Configuration menu
    Copy the full SHA
    457c137 View commit details
    Browse the repository at this point in the history
  7. Get page_id after inserting.

    emmiegit committed Jun 29, 2024
    Configuration menu
    Copy the full SHA
    95791a8 View commit details
    Browse the repository at this point in the history
  8. Add log messages.

    emmiegit committed Jun 29, 2024
    Configuration menu
    Copy the full SHA
    f7424f3 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    9725a6b View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    2dd4b0f View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    f7dd20d View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    e9278b6 View commit details
    Browse the repository at this point in the history
  13. Fix log file mode.

    emmiegit committed Jun 29, 2024
    Configuration menu
    Copy the full SHA
    7d803b7 View commit details
    Browse the repository at this point in the history
  14. Fix argument processing.

    emmiegit committed Jun 29, 2024
    Configuration menu
    Copy the full SHA
    beeca1e View commit details
    Browse the repository at this point in the history
  15. Run black formatter.

    emmiegit committed Jun 29, 2024
    Configuration menu
    Copy the full SHA
    acf3bce View commit details
    Browse the repository at this point in the history
  16. Remove extra newline.

    emmiegit committed Jun 29, 2024
    Configuration menu
    Copy the full SHA
    74e85e6 View commit details
    Browse the repository at this point in the history
  17. Unify page table schema.

    emmiegit committed Jun 29, 2024
    Configuration menu
    Copy the full SHA
    4aaa85c View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    e67b5d2 View commit details
    Browse the repository at this point in the history
  19. Call all stubs.

    emmiegit committed Jun 29, 2024
    Configuration menu
    Copy the full SHA
    34ab297 View commit details
    Browse the repository at this point in the history
  20. Return s3_path after upload.

    emmiegit committed Jun 29, 2024
    Configuration menu
    Copy the full SHA
    c3cc44d View commit details
    Browse the repository at this point in the history
  21. Add missing import.

    emmiegit committed Jun 29, 2024
    Configuration menu
    Copy the full SHA
    632eb5a View commit details
    Browse the repository at this point in the history
  22. Add s3_hash to file table.

    emmiegit committed Jun 29, 2024
    Configuration menu
    Copy the full SHA
    2f41c63 View commit details
    Browse the repository at this point in the history
  23. Move comment placement.

    emmiegit committed Jun 29, 2024
    Configuration menu
    Copy the full SHA
    42b998d View commit details
    Browse the repository at this point in the history
  24. Fix runtime issues in s3.py

    emmiegit committed Jun 29, 2024
    Configuration menu
    Copy the full SHA
    061f0f4 View commit details
    Browse the repository at this point in the history
  25. Configuration menu
    Copy the full SHA
    e25308d View commit details
    Browse the repository at this point in the history
  26. Configuration menu
    Copy the full SHA
    a5960cc View commit details
    Browse the repository at this point in the history
  27. Implement file uploads.

    emmiegit committed Jun 29, 2024
    Configuration menu
    Copy the full SHA
    f613241 View commit details
    Browse the repository at this point in the history
  28. Remove TODO comment.

    emmiegit committed Jun 29, 2024
    Configuration menu
    Copy the full SHA
    b6d2776 View commit details
    Browse the repository at this point in the history
  29. Add forum tables to schema.

    emmiegit committed Jun 29, 2024
    Configuration menu
    Copy the full SHA
    9909238 View commit details
    Browse the repository at this point in the history
  30. Allow multiple meta paths.

    emmiegit committed Jun 29, 2024
    Configuration menu
    Copy the full SHA
    1642d13 View commit details
    Browse the repository at this point in the history

Commits on Jun 30, 2024

  1. Configuration menu
    Copy the full SHA
    8406d17 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    3600c5c View commit details
    Browse the repository at this point in the history
  3. Remove last_posted_at.

    emmiegit committed Jun 30, 2024
    Configuration menu
    Copy the full SHA
    6977a6d View commit details
    Browse the repository at this point in the history
  4. Fix missing data.

    emmiegit committed Jun 30, 2024
    Configuration menu
    Copy the full SHA
    6402983 View commit details
    Browse the repository at this point in the history
  5. Handle missing directory.

    emmiegit committed Jun 30, 2024
    Configuration menu
    Copy the full SHA
    963cb6c View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    82876c5 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    94567d5 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    a5084dc View commit details
    Browse the repository at this point in the history
  9. Update schema SQL.

    emmiegit committed Jun 30, 2024
    Configuration menu
    Copy the full SHA
    74c2c72 View commit details
    Browse the repository at this point in the history
  10. Start process_post() method.

    emmiegit committed Jun 30, 2024
    Configuration menu
    Copy the full SHA
    14f8895 View commit details
    Browse the repository at this point in the history

Commits on Jul 4, 2024

  1. Configuration menu
    Copy the full SHA
    cf2e416 View commit details
    Browse the repository at this point in the history
  2. Remove extra newline.

    emmiegit committed Jul 4, 2024
    Configuration menu
    Copy the full SHA
    17de203 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    28517e4 View commit details
    Browse the repository at this point in the history
  4. Only process revision section if there's data.

    Slight speed-up by avoiding sorting and iteration if there's nothing.
    emmiegit committed Jul 4, 2024
    Configuration menu
    Copy the full SHA
    1b72b17 View commit details
    Browse the repository at this point in the history
  5. Removed debug comment line.

    emmiegit committed Jul 4, 2024
    Configuration menu
    Copy the full SHA
    d3c6dce View commit details
    Browse the repository at this point in the history
  6. Insert blob records to SQLite database.

    And foreign key for ensuring consistency.
    emmiegit committed Jul 4, 2024
    Configuration menu
    Copy the full SHA
    f85d282 View commit details
    Browse the repository at this point in the history
  7. Store MIME type in SQLite too.

    For the mime_hint column in Wikijump.
    emmiegit committed Jul 4, 2024
    Configuration menu
    Copy the full SHA
    4df6c2a View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    210b741 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    18eecb8 View commit details
    Browse the repository at this point in the history
  10. Fix issues.

    emmiegit committed Jul 4, 2024
    Configuration menu
    Copy the full SHA
    8867a25 View commit details
    Browse the repository at this point in the history
  11. Skip missing forum directory.

    This happens if there are no posts.
    emmiegit committed Jul 4, 2024
    Configuration menu
    Copy the full SHA
    763dd84 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    8b70757 View commit details
    Browse the repository at this point in the history
  13. Run black formatter.

    emmiegit committed Jul 4, 2024
    Configuration menu
    Copy the full SHA
    1ecef2b View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    8a081f6 View commit details
    Browse the repository at this point in the history
  15. Update message again.

    emmiegit committed Jul 4, 2024
    Configuration menu
    Copy the full SHA
    9a20bf9 View commit details
    Browse the repository at this point in the history
  16. Fix add_page_vote().

    emmiegit committed Jul 4, 2024
    Configuration menu
    Copy the full SHA
    514b5b6 View commit details
    Browse the repository at this point in the history

Commits on Jul 5, 2024

  1. Configuration menu
    Copy the full SHA
    0b05279 View commit details
    Browse the repository at this point in the history
  2. Change database commit order.

    Save the data more frequently.
    emmiegit committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    6564e93 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    a1f9eaa View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    b537368 View commit details
    Browse the repository at this point in the history
  5. Fix comparison.

    emmiegit committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    d129f42 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    e4c3dc0 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    9bcea0b View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    1c22dc9 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    028fb16 View commit details
    Browse the repository at this point in the history
  10. Fix seed syntax.

    emmiegit committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    1ae7a37 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    3ff01f6 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    11335d4 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    f6d62ca View commit details
    Browse the repository at this point in the history
  14. Fix deletion logic.

    emmiegit committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    5e0af3c View commit details
    Browse the repository at this point in the history
  15. Fix argument.

    emmiegit committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    15ce770 View commit details
    Browse the repository at this point in the history
  16. Log updated fields.

    emmiegit committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    c435bec View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    273e9ae View commit details
    Browse the repository at this point in the history
  18. Only use database blob check.

    S3 has stuff from prior runs.
    emmiegit committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    622e04e View commit details
    Browse the repository at this point in the history
  19. Configuration menu
    Copy the full SHA
    77972a0 View commit details
    Browse the repository at this point in the history
  20. Ignore un-downloaded files.

    I hate consistency issues >:(
    emmiegit committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    4cfdfd7 View commit details
    Browse the repository at this point in the history
  21. Emit commas for lengths.

    emmiegit committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    53adddc View commit details
    Browse the repository at this point in the history
  22. Consume missing page for file.

    If it's not in there, then the file must be leftover, from a deleted
    page.
    emmiegit committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    fa60b17 View commit details
    Browse the repository at this point in the history
  23. Fix logger call.

    emmiegit committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    19823d3 View commit details
    Browse the repository at this point in the history

Commits on Jul 6, 2024

  1. Configuration menu
    Copy the full SHA
    3998b00 View commit details
    Browse the repository at this point in the history
  2. Fix percent type.

    emmiegit committed Jul 6, 2024
    Configuration menu
    Copy the full SHA
    0500c16 View commit details
    Browse the repository at this point in the history
  3. Run black formatter.

    emmiegit committed Jul 6, 2024
    Configuration menu
    Copy the full SHA
    f1d02b9 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    a4c9b6a View commit details
    Browse the repository at this point in the history
  5. Add another default False.

    emmiegit committed Jul 6, 2024
    Configuration menu
    Copy the full SHA
    59b1313 View commit details
    Browse the repository at this point in the history

Commits on Jul 7, 2024

  1. Configuration menu
    Copy the full SHA
    2eac7f0 View commit details
    Browse the repository at this point in the history

Commits on Jul 9, 2024

  1. Configuration menu
    Copy the full SHA
    1ad5d8b View commit details
    Browse the repository at this point in the history

Commits on Jul 10, 2024

  1. Configuration menu
    Copy the full SHA
    af85d70 View commit details
    Browse the repository at this point in the history