Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] DocumentTranslator - No TargetInputs definition #2251

Open
3 of 19 tasks
haithamshahin333 opened this issue Jul 18, 2024 · 1 comment
Open
3 of 19 tasks

[BUG] DocumentTranslator - No TargetInputs definition #2251

haithamshahin333 opened this issue Jul 18, 2024 · 1 comment

Comments

@haithamshahin333
Copy link

SynapseML version

1.0.4

System information

  • Language version (e.g. python 3.8, scala 2.12):
  • Spark Version (e.g. 3.2.3):
  • Spark Platform (Synapse):

Describe the problem

Cannot call the DocumentTranslator setTargets param in the constructor. Unclear on what the definition for the TargetInputs object should be in pyspark. How should the targetInputs be defined in pyspark to enable calling DocumentTranslator?

https://mmlspark.blob.core.windows.net/docs/1.0.4/scala/com/microsoft/azure/synapse/ml/services/translate/TargetInput.html

Code to reproduce issue

DocumentTranslator()
....
.setTargets([{targetUrl: "", language: ""}])

Other info / logs

No response

What component(s) does this bug affect?

  • area/cognitive: Cognitive project
  • area/core: Core project
  • area/deep-learning: DeepLearning project
  • area/lightgbm: Lightgbm project
  • area/opencv: Opencv project
  • area/vw: VW project
  • area/website: Website
  • area/build: Project build system
  • area/notebooks: Samples under notebooks folder
  • area/docker: Docker usage
  • area/models: models related issue

What language(s) does this bug affect?

  • language/scala: Scala source code
  • language/python: Pyspark APIs
  • language/r: R APIs
  • language/csharp: .NET APIs
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/synapse: Azure Synapse integrations
  • integrations/azureml: Azure ML integrations
  • integrations/databricks: Databricks integrations
@mhamilton723
Copy link
Collaborator

Try using setTargetsCol("colname")

heres a quick example of how to make a targets column, note that the provided values are just toy examples to show th syntax and should be set for your application

from pyspark.sql.types import StructType, StructField, StringType, ArrayType


# Define the Glossary schema
glossary_schema = StructType([
    StructField("format", StringType(), True),
    StructField("glossaryUrl", StringType(), True),
    StructField("storageSource", StringType(), True),
    StructField("version", StringType(), True)
])

# Define the TargetInput schema
target_input_schema = StructType([
    StructField("category", StringType(), True),
    StructField("glossaries", ArrayType(glossary_schema), True),
    StructField("targetUrl", StringType(), False),
    StructField("language", StringType(), False),
    StructField("storageSource", StringType(), True)
])

from pyspark.sql import Row

# Sample data for the TargetInput column
data = [
    Row(category="Category1",
        glossaries=[
            Row(format="PDF", glossaryUrl="http://example.com/glossary1.pdf", storageSource=None, version="1.0"),
            Row(format="HTML", glossaryUrl="http://example.com/glossary2.html", storageSource="source1", version=None)
        ],
        targetUrl="http://example.com/target1",
        language="en",
        storageSource="sourceA"),

    Row(category=None,
        glossaries=None,
        targetUrl="http://example.com/target2",
        language="fr",
        storageSource=None)
]


df = spark.createDataFrame(data, schema=target_input_schema)
df.show(truncate=False)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants