From 4a9a1ff7bbf41aa34ada00494db07c52b5faa647 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jo=C3=A3o=20Pedro=20Martins?= Date: Tue, 14 Dec 2021 22:04:31 +0100 Subject: [PATCH] Add details on the Batch scoring session of Getting Started (#389) * Revision of getting started guide up to Batch scoring. Also new diagam and fix to ARM template to remove region restrictions. * Detail on Batch scoring for Getting Started and additional debug message in the copy to ease of diagnosing issues * Tweaked text and added a NOQA for message Co-authored-by: Joao Pedro Martins --- .../scoring/parallel_batchscore_copyoutput.py | 2 +- docs/getting_started.md | 32 ++++++++++++------ docs/images/batch-child-run-scoringstep.png | Bin 0 -> 9057 bytes 3 files changed, 22 insertions(+), 12 deletions(-) create mode 100644 docs/images/batch-child-run-scoringstep.png diff --git a/diabetes_regression/scoring/parallel_batchscore_copyoutput.py b/diabetes_regression/scoring/parallel_batchscore_copyoutput.py index cc4af42c..1bcde4b6 100644 --- a/diabetes_regression/scoring/parallel_batchscore_copyoutput.py +++ b/diabetes_regression/scoring/parallel_batchscore_copyoutput.py @@ -86,6 +86,6 @@ def copy_output(args): or args.output_path is None or args.output_path.strip() == "" ): - print("Missing parameters") + print("Missing parameters in parallel_batchscore_copyoutput.py -- Not going to copy inferences to an output datastore") # NOQA E501 else: copy_output(args) diff --git a/docs/getting_started.md b/docs/getting_started.md index 3cd1f263..977fe626 100644 --- a/docs/getting_started.md +++ b/docs/getting_started.md @@ -286,39 +286,49 @@ The pipeline has the following stage: ### Set up the Batch Scoring pipeline -In your Azure DevOps project, create and run a new build pipeline based on the [diabetes_regression-batchscoring-ci.yml](../.pipelines/diabetes_regression-batchscoring-ci.yml) -pipeline definition in your forked repository. +In your Azure DevOps project, create and run a new build pipeline based on the [.pipelines/diabetes_regression-batchscoring-ci.yml](../.pipelines/diabetes_regression-batchscoring-ci.yml) +pipeline definition in your forked repository. Rename this pipeline to `Batch-Scoring`. Once the pipeline is finished, check the execution result: ![Build](./images/batchscoring-ci-result.png) -Also check the published batch scoring pipeline in the **mlops-AML-WS** workspace in [Azure Portal](https://portal.azure.com/): +Also check the published batch scoring pipeline in your AML workspace in the [Azure Portal](https://portal.azure.com/): ![Batch scoring pipeline](./images/batchscoring-pipeline.png) Great, you now have the build pipeline set up for batch scoring which automatically triggers every time there's a change in the master branch! -The pipeline stages are summarized below: +The pipeline stages are described below in detail -- and you must do further configurations to actually see the batch inferences: #### Batch Scoring CI - Linting (code quality analysis) - Unit tests and code coverage analysis -- Build and publish *ML Batch Scoring Pipeline* in an *ML Workspace* +- Build and publish *ML Batch Scoring Pipeline* in an *AML Workspace* #### Batch Score model - Determine the model to be used based on the model name (required), model version, model tag name and model tag value bound pipeline parameters. - If run via Azure DevOps pipeline, the batch scoring pipeline will take the model name and version from the `Model-Train-Register-CI` build used as input. - If run locally without the model version, the batch scoring pipeline will use the model's latest version. -- Trigger the *ML Batch Scoring Pipeline* and waits for it to complete. +- Trigger the *ML Batch Scoring Pipeline* and wait for it to complete. - This is an **agentless** job. The CI pipeline can wait for ML pipeline completion for hours or even days without using agent resources. -- Use the scoring input data supplied via the SCORING_DATASTORE_INPUT_* configuration variables, or uses the default datastore and sample data. -- Once scoring is completed, the scores are made available in the same blob storage at the locations specified via the SCORING_DATASTORE_OUTPUT_* configuration variables. - -To configure your own custom scoring data, see [Configure Custom Batch Scoring](custom_model.md#Configure-Custom-Batch-Scoring). - +- Create an Azure ML pipeline with two steps. The pipeline is created by the code in `ml_service\pipelines\diabetes_regression_build_parallel_batchscore_pipeline.py` and has two steps: + - `scoringstep` - this step is a **`ParallelRunStep`** that executes the code in `diabetes_regression\scoring\parallel_batchscore.py` with several different batches of the data to be scored. + - `scorecopystep` - this is a **`PythonScriptStep`** step that copies the output inferences from Azure ML's internal storage into a target location in a another storage account. + - If you run the instructions as defined above with no changes to variables, this step will be **not** executed. You'll see a message in the logs for the corresponding step saying `Missing Parameters`. In this case, you'll be able to find the file with the inferences in the same Storage Account associated with Azure ML, in a location similar to `azureml-blobstore-SomeGuid\azureml\SomeOtherGuid\defaultoutput\parallel_run_step.txt`. One way to find the right path is this: + - Open your experiment in Azure ML (by default called `mlopspython`). + - Open the run that you want to look at (named something like `neat_morning_qc10dzjy` or similar). + - In the graphical pipeline view with 2 steps, click the button to open the details tab: `Show run overview`. + - You'll see two steps (corresponding to `scoringstep`and `scorecopystep` as described above). + - Click the step with the with older "Submitted time". + - Click "Output + logs" at the top, and you'll see something like the following: + ![Outputs of `scoringstep`](./images/batch-child-run-scoringstep.png) + - The `defaultoutput` file will have JSON content with the path to a file called `parallel_run_step.txt` containing the scoring. + +To properly configure this step for your own custom scoring data, you must follow the instructions in [Configure Custom Batch Scoring](custom_model.md#Configure-Custom-Batch-Scoring), which let you specify both the location of the files to score (via the `SCORING_DATASTORE_INPUT_*` configuration variables) and where to store the inferences (via the `SCORING_DATASTORE_OUTPUT_*` configuration variables). + ## Further Exploration You should now have a working set of pipelines that can get you started with MLOpsPython. Below are some additional features offered that might suit your scenario. diff --git a/docs/images/batch-child-run-scoringstep.png b/docs/images/batch-child-run-scoringstep.png new file mode 100644 index 0000000000000000000000000000000000000000..6b87f52dbe502d9a9e262aecfa169c24f97301a9 GIT binary patch literal 9057 zcmc(Fbx>Pj`{kt+C{`-Cw$S1Z0SZM5#oet0cPo!5xP{e|JCAu9=#4N-5Q4<1^-Q+@{kmC^V&rda4>+>dhlZUFGO~ z(3hab1bQAUlLHc@Pr3o1>eK(1QzcO=LVUBu!M}1J+{r0gz-p@uQhsCS7}};QYkVh` z+o{O;Ms~oWkZfrQVk_FmRF<+gB}wbW-}?d{yP7Qd`ox1AvuUxa4@8Wt8j6JB%x^|6|)_We5jEx|0K$(A_GZ8?7<$*+WW;I>-nQu@3b~BNi|D7KO+H4D6;-0J?aXiB14njm?ai7!SFwl#Xj8kyyM8*G*BtvW@_L2D@*<`o_D6{Qfx{Zg$o%Zw z$R+g{b@)SGNsbSTuw2QN*m~E~VyIB;j=OIrjdnhw(5$u^v}JpMHGGKj(I9C$l~Bl~ zKSfjoqI~19iPzw!O;4q2zbl0=*2*karWj|c zr&55{a}W`vKA658LBw|NcgiUclY|9rO~eJuX(Tea=?}Y7$?n8V`*Yd_DUCKrb6?{p zdue*g)1Mk~JG|L}CD!1{kJ(IElzACB)X+P4z;0QS5+}ljSNmUZd@_AL=L20S;m*;$ zUI>%5o@}5jbSBw%)OVwS)Yw90gRnr;(@4sMz6uf0Sm^Hoh2pjt2IZfxPQymG zvr9AP-*I8p#y}rJ!xU7bP7KsYAZd6yKE{Og+YbH&HGqwlvqYbaOrg%~?n|MVC<(U_nL z^7Ixy(g=pm@QWWc;)D1)%sqXH^U?$z8i3p0UQYzXKP1;&Ph&WkG+Lu*o)sBH@%p_z z>q!297QfKJ{5H8Qhb;(`K6h{eSXrCmm?~!Q^mA-R&ax~spPq??4w|#EU6l))NzNV| z=xFQe7Rc^tb3G-h5SZz-lz#K&@4*ftJT(@-Ma)+EZQ#xl>321#2?-(GI}-+Dcjqrb zcEOaOA6#CD8s@NuYvUjU#-KXEacMJ@FCJkWq_ML7q5SJhFTnj`66#`Ss&BfjAibt6 zW}s-}F@VAsGHE_+;R_-g*S$*BU)pICOGzlaiRBiLm+?IMmDJAF9+KBpy0(IEr5D^3 zSAyuskgORh%M)+SMaSkbA+=_piuo>Jk=CN&t0eu#c`t6+w zs(OVsH?~`PuPDLy_k|j^3;uf6Z4?+caNY03&(*7MwR5D`VcGfaqSkWq7J88{622{^ z6PK6k7GL;Q)$GEARAv~-W-*c@$DmKSECWB~dX2?~*{8Xtg9y{KOHQer%uIecmFgK* zvlM?&E}+Z*P3ZVD#g#gvSb1=wvpNr=83d}cN9y^{z8q(mQKR7lq4w)nLO(k-2Z1%^hd4R7zm|E#VrKt||0EwjHm&BZyeEj& zs%(+lI>^=O!_CeD1siB9UfQ;KB|JSM%m-;)93ASNq-rwqm$&AzHIfr=)cu`&iHYxM z4%)@OPgE*J%UjJA-a88Uol}Ay9zZxOY89x%jabF-yuqc=`X`*6%u(vemqL;VeUBP^3Ws)Q!`9d4G}5j z{3PsxLA+MYJPAiDu3|=HBWw^oO5s zkig{eO6N9LrU%`w!iXK`ab-g^_)@7owI5A;%TOvkvROUl8cSz+<||7m_^KtGYA<*; z`$v1K)C4Da*&>r;^XXrybK~I2SFP$?VB!T2@p2y}BqMD!J>GleHN7#p8IufHx!nd`05p%yn-5TawEu)j~Js%F!s9@Tr;4Y zsQ|mO{Oh-Z83EPRs}cWd_1{x5WhY)7Mx|7i#-9bki9iim2oVb#)wGyhv;6mEmoh;` zHkSwiSh{DQ1Bs5#wn&GGm#TU90kL<_Qs01wWd9|GI3zlZo-&;kj;DjHRNN zj{miQ&#D-%w!h(`h&tzbW4(|g)8{i(71b4v|JwbOsR%psy2SooyO#D0)n~OLP{)%ETn7~eEb6_)ku^UUeXi9@CR*AtYMT79h- zBc#y;k?;996(@96PV1Hq@bXXGT0Poci=+$bB8-iw=`!f17`=L=L*QUyWI)o=`bB{T zT;K7TUP28ggY;>@=4dFG;Z*a zW#yD~KybN-6xr+2HP8?q-By6xcj53j&;sj#p1tqPH1U`ZZ*VJ|aejh%Kaj;|eEJoI z{Q)FrD!*iMw0r3ab$Z_Ko!HF3r+!wyJoxoA@u1m$Ct-9yXi#HL38_8f8<{jAPuuN#o$A{k^64ZiNhCHUTX@0{w*et)uyTAQ9lb!gr*!u<8( z82KlKb(nL(ZuG4?W7rlUOzq%cLaO?(`819r;Q3yytc0gj1 zWP!k}EzId?Ns^24&?@WYPOh_G2TMj zb|JALsX(NML6#3{b7q)< z^Op+RLoU6=2pS!1CmhD^x+{jhLnh4m>YORBCvb&hq* zCo}UYGW6@k_aKQK@?Y(0n0WegFS8&7CcifiYkB{8h`*zfqQ?9r-8VQ$GZyHKta`Bt z1ODgSl3D!#VApVRa=O2}^~#b$U*Z2Wv1D1gd~gN%uLK?8yOvsSJl1nHwLL~=v)(-d z0RHe^mk*_)$%1#to%q-*|D*lP?ap^(W>-T`oE~xxD{^#oZ(#t*gA+nx0c)2=lRVBu zk52b(?HQ2MXD9e4H)fw9CrxZKHVl{&d|HWg&C~%+KN9W80@#$oR}5KtJsy0SkAuP; zOhm>woCeYcWu1?BfnR=Fi5cua%SH)(3Kjhjb?R`~n_h|8t!1SFn)$1zx z9S7$e<}%jRAmP4}8b3#BPYfUvd^~LS!$S>~$K7eVagZt48ae9!hZUyhN;{&s&%Ea2;K&=S@hPaWM?`;Z4F6IgjIDGtNV(^1M!*`tN@1_yYU{>YXM}Uk@-gP@z>vBzD#L zJX%p4Obaiw469`dd>V%1RU|OJ+|YEBpiDgirJFm3WX>73b_1qAf8A5Q!VcvL} z>AjeWHdRz8NidJitXb@`UVwh3%denQQAZ-U(k9bWi9=60qCh(j+Shy7bZXCK0I`&E{5 zKe|*qj)MidNO~P4-CZ{6LGw_Xe>Ay z1gMakfWnH+jQxLC@>kC({dcs<7NX_1dBNrWXjA0B5J@nxfJaejG@ww<##3~lBsW-dcy0R=vg`F$ry>!cBW&4Wk71o&|?!aP>D_`w_5Es`$ z%lSgH&p9VoyB3YITHf9|NwWSLPvJ_vOD1aN!Y&FL`#cY?6b7-;qr*7-YP(|Gb$G8-ulqas?N6$MJ=hd^x$Y zlP6xs+mvtld2cFgXJC3QM=1ZBqi%)yf zIk#aBF#Koq{3jR;77-EQvFMi=EF~$)sFEr-KK8c!L!+78uW#QTiHnOD(~0}TV`HBq zM-JQWZ=Jnmq;eO?BIWx9e9jzPmfVFY^=j?nJ+T}7{v5(ruMgUWdwa8vitYvl7bBvh z3;KTi_|fWjU3CdaJbx>+b1Uc!=D*_UA0!znHbM&1-0rG3vKSI~VCMT~ zKkY&ITY`kqw4zrV83Fm^mPD28hE*2w8Df5V%2LdvGt`{Mtx9g0sON;5gvf%tJo%hQ zF)=a3o%|A6Zz0=S0*}+WsA*_enDf;4qg9A#X=yPqFko_gSp_*vAb+srMzm>1aHZ*w z#6;OPTMld@+5tO2BHe(vdtz7E^Y8_tv&yo+3A)#3&w&RrnW=pnWUjAIQBD&0^C!8T za}|LjJ>F!Rh=;~X;^4>jK->w5e7y}K#m&(WI{)dRp{T?i?tL(~y9YNhtgNhz{R($5 zSGOE|k{gJ`Go#;uWs+po6=$XY zm>7BNNFO$vXzZUf80mFB??JdV^Ij^L;El`j!xZ1sZ6sVrQhCMTa-rHmg ztn||QV2b{R1IO+4d7J;Ow}Jv5SG|B;+xP>jbydt0_mf;LRq0u3nEzCn5dyJn_&6^w z?@yhhd3?={0{~d`#|b%gv+oPq-F=f!NxqO`K=o2`JI?Y;StEL5nTjAE^!4S*E;9Vi zC@iToD{x%~s72qm1n04iFX{$sUW< zxc%eYcH2eeT;biZJUZr&b^`2&%wv3fa});P#k=OLqeiud8WzvsvyNJ;3m4(Gn=jX| zX5{4DZ*`V@pvUj4o8HsWVSaaW*<(gwCKyOSCq}JLVB%unZjIH0AziV=_W}xqqN7y1 zfcX#z|HO5M1B3^nV*_yG|G(CPN8a7t+(eFz=Uo~fe*tIstDbD~{U8G}&9^)Q{974( zAuN_hm2pgL&ZLJ`>FquQqE1l#!p{ zd6rZq^y*b75~LVFB)SwmYLGPZUYc z^F{js@DnEO;-F?^WTdT)z$e{2Uh5aXzdNjTOCZmq>2>5Q%guh`&O<}+OfCHsQK6cUrR${tVEYfL_|a< zH!v^36%3}W^<~LVOHI{lsYI6OjypGr>DAag%@Zhj2yiEDxj6JJ=P4%ktzuims{hs4 z)wnpW3Z(2gX-30fdoogO7u@I?MEG36!uu&23(227`|=+96DcoGl!yjt?A`6P=8h+7 zeGulU#f$c>nBP?dowyhePfVU8S}I83Ftqs;h6Y{Bc=lQhz@R5i72PICr>;#mgXhkr z#?Y0cn82=IZ8iA)r4+MaQB40BbbF!*?eD6e!I3-q-IWCe1qP+yW+;b`;jpI#V8eIu zy++Q7?uQc}d$RR#y0Wqr7z@~YNcV$ubnWuy5*1*tucV=&v8l!HzNv;Li8i98ZaIcH z`J(~Yfv*@DvyC4B1G!&sJ}UzkrT>N%;3w99dXiClAW{Z{!67?35D171{p=&6wia@} zKaWD8L@xZg;}Du=0sbGo-u(NEpq7lK$TJJGh^D(Ho#~k4!bcSNTc0Tf{+)C1UyUC% z%u$XG4&Eo5qs<=9OiTr>&nQ2-yVq1#S65XnGhm=w#6;`C#Wu2Jm$QC3@-Zy`<=NR; zU0t0vvONIrVE1}o$LXkCYfEGVI?DhpYZ2KgE@5M3Wo2WV=_y@9CllJcy}i9lmJI&> z{^*W?u535~Kx5$L<%M%!2G8&0sr4;VpF@YAeQ%G#nT+g1b3ffkif|vN>2FNV$7#$l z_I`%``0CUFC!?}2U%qr_{Hy(}n23-t4=VhO;;&%#3ZjxFjbwr|pM4?T>$bAZs3--o zYFxnDRmvjdQZzO+RHoyUJZv5j3{mo%mH`nMEB0l2nG}rKyjO=`;yp2}a-RA0K3to4LcA%NLdA-woU;p6wUi~7c zVI4acS9D^^!LQGlY&x*a*oR!057YlFXy|jJgfyc z-m_{vmw!_71WRmK&k8R+y}k4E^X*<6`yOGX0Ru?!^~( z*9V5K?V0NV=%H0njLG|G8L_-<1_l5M8!+{s>2EKuk&S;~dia({_t%N{GMxOh8ORV<0eysIaM;p^A0MKlVR(#i+r zZFk-)U6FTpckDyi+1ZRrKYuY=J0cB4X;TCoz0n?xpvvk(nVOF!t)w`&dTr$;ZZ_ID zO?>3!B3o3>pxFH`R8B zthOE&T=4Ap{d+0RV}U~4|E>1^bY;i<{B1#oKP9ilqrBC0Q6J{yddEMLaa7kfk{!NP zRaLZJt1;%?Q7y-PY&kNK*;t-?wNr(V`}GuO0p1cX)Ox9LTYCD{O0;-BZnOKAWPA({UOH?Z(Y)iO2tt7CSuPxFd4XKX22Z;lm{sR96CR=fH-iqL!7Kzwif*7jikDW5l2 zz}OMQkF@+xfFo){92K*=*BSbJ;&Gk{rLg1i6so)lD%W*{@-H%mms zCxnpTkqdqjQ4zyu|HB5<@=wmCt@DbIPN#9Ic;wD`VqQATGqYeYSp4)d$K%|)x}hzW z5F~K0P`csY0QI;$UJIf1E_7aOh1oPPmDkkNR8(*a3(t;^$3EEKD#&PcUBff)n z>%$HrWAClJf^_e&F-@#&A+e=cGMV4^qh1Endl34ofM zb;J?6o)q@xLgt&JsCuY=u*WRE)uU}~u5qH1{%d~bx9e!b3QkBuv)r*N$C;`%G^iN| z+@9{wH|w{wwA@ZtnrmukFbA<9!otJFuXn2r^ktrF?EHMaUZh#@m`-e9o6hs^=Pap^ zG1yRscmNu`M19UGyv-*kC;8)bN<_VmUk&;$cZTv`qDPH@t+Cez1P1B2&lwvU8`bH@ zos)4R6VlVufBx(#AbXgc%9}`+U4ve{+#5A1my!|JL)+#Gnq% zdN(#U(4oBD9L;fXl~hzI$fonZcqt;1qKhHXYzJNBscmn9R)uMwu3NIi+*@1sE=GUY zuxwc>Ga|ijEr?f}kE2CGd2B{9r@%guW<1Qyn`j$PQjnTjx0k~K=^*>bCddx0-^r-1 zv0wah`DB_S+M=8BjJx-y=M$Q$>Wmt`)mP|zvRP^m+89bpdE>@7m_%!N+GS&q{3giZ z{v^ZcrE*)#qZ=+cK_6Kv9c-@X;qD`4`;M_p{$aJKVe{YMbiz-FrtCc9jn4- zH2a|SI_jU%|Cn`!;1{SR*m?D;Adc#06LJ!(%C;vzq(~0XQKtEQsv;rGdgWN4*!h|6^|Ml8ynDs2QZ-fPweX7mJ*65f%iDs Y>K~DKA#y$FZ83nnw95N3Ns}-C4bb