Hans Hübner | 28 Jun 06:45 2016
Gravatar

[MarkLogic Dev General] Bulk updates (xqsync vs. mlcp)

Hi,

we're planning to use MarkLogic to do regular bulk updates on a larger set of documents (~1 million).  Many of the documents will be unchanged from their previous version, and we'd like to avoid reinserting them as we want to be able to use the point-in-time query feature to track document changes over time.  I've read an old thread in this forum that suggested calculating a checksum over each input document and then only writing it to the database if the previous version's checksum differs.  In that same thread, it was also suggested that xqsync could be used.

Now xqsync apparently was replaced by mlcp, and I can find an indication in the mlcp documentation that it avoids writing unchanged documents.

Can anyone suggest the best way to approach this?

Thanks!
Hans

<div><div dir="ltr">Hi,<div><br></div>
<div>we're planning to use MarkLogic to do regular bulk updates on a larger set of documents (~1 million).&nbsp; Many of the documents will be unchanged from their previous version, and we'd like to avoid reinserting them as we want to be able to use the point-in-time query feature to track document changes over time.&nbsp; I've read an old thread in this forum that suggested calculating a checksum over each input document and then only writing it to the database if the previous version's checksum differs.&nbsp; In that same thread, it was also suggested that xqsync could be used.</div>
<div><br></div>
<div>Now xqsync apparently was replaced by mlcp, and I can find an indication in the mlcp documentation that it avoids writing unchanged documents.</div>
<div><br></div>
<div>Can anyone suggest the best way to approach this?</div>
<div><br></div>
<div>Thanks!<br>Hans<br clear="all"><div><br></div>
</div>
</div></div>
Karthik.Nagarajan2 | 23 Jun 12:18 2016

[MarkLogic Dev General] Customise and override content API's authentication and authorization

Hi,


Is it possible to add an interception script or layer on top of the content API's to do another level of authentication before forwarding the request to MarkLogic server?

I want to authenticate with some external system before the request reaches the ML server.


Thanks,
Karthik
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
<div>
<div>
<p><span>Hi,</span><br></p>
<div>
<div>
<div>
<p><br></p>
<p>Is it possible to add an interception script or layer on top of the content API's to do another level of authentication before forwarding the request to MarkLogic server?</p>
<p>I want to authenticate with some external system before the request reaches the ML server.</p>
<p><br></p>
<div>
<div>Thanks,<br>
Karthik</div>
</div>
</div>
</div>
</div>
</div>
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original
 message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law,
 this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
</div>
Stephane.VARIN | 23 Jun 11:19 2016

[MarkLogic Dev General] format:json && extract-document-data

Hi,

 

I am trying to include some document data into my search results, using the following query options:

 

<options xmlns="http://marklogic.com/appservices/search">

    <extract-document-data selected="include">

          <extract-path>/language-version/ language-version-canonical-model/title</extract-path>

          <extract-path>/language-version/ language-version-canonical-model/language</extract-path>

(…)

    </extract-document-data>

</options>

 

Unfortunately, when I ask for json format (using header Accpet: application/json), the extracted element comes as “stringyfied” xml instead of being converted into json as I would have expected:

 

{

  "snippet-format": "snippet",

  "total": 564,

  "start": 1,

  "page-length": 10,

  "selected": "include",

  "results": [

    {

      "index": 1,

      "uri": "ENV/CHEM/NANO(2015)22/ANN5/2",

      "path": "fn:doc(\"ENV/CHEM/NANO(2015)22/ANN5/2\")",

(…)

      "extracted": {

        "kind": "element",

        "content": [

          "<language>En</language>",

          "<title>ZINC OXIDE DOSSIERANNEX 5</title>",

          "<reference>ENV/CHEM/NANO(2015)22/ANN5</reference>",

          "<classification>2</classification>",

          "<modificationDate>2015-04-16T00:00:00.000+02:00</modificationDate>",

          "<subject label_en=\"media\">media</subject>",

          "<subject label_en=\"fish\">fish</subject>",

(…)

        ]

      }

    },

 

Anything I am doing wrong? Is there some configuration options I could tweak to enforce the conversion of xml to json?

 

Cheers,

Stéphane Varin

<div>
<div class="WordSection1">
<p class="MsoNormal">Hi,<p></p></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
<p class="MsoNormal">I am trying to include some document data into my search results, using the following query options:<p></p></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
<p class="MsoNormal"><span lang="FR">&lt;options xmlns="http://marklogic.com/appservices/search"&gt;<p></p></span></p>
<p class="MsoNormal"><span lang="FR">&nbsp;&nbsp;&nbsp; </span>&lt;extract-document-data selected="include"&gt;<p></p></p>
<p class="MsoNormal">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;extract-path&gt;/language-version/ language-version-canonical-model/title&lt;/extract-path&gt;<p></p></p>
<p class="MsoNormal">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;extract-path&gt;/language-version/ language-version-canonical-model/language&lt;/extract-path&gt;<p></p></p>
<p class="MsoNormal">(&hellip;)<p></p></p>
<p class="MsoNormal">&nbsp;&nbsp;&nbsp; <span lang="FR">&lt;/extract-document-data&gt;<p></p></span></p>
<p class="MsoNormal">&lt;/options&gt;<p></p></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
<p class="MsoNormal">Unfortunately, when I ask for json format (using header Accpet: application/json), the extracted element comes as &ldquo;stringyfied&rdquo; xml instead of being converted into json as I would have expected:<p></p></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
<p class="MsoNormal">{<p></p></p>
<p class="MsoNormal">&nbsp; "snippet-format": "snippet",<p></p></p>
<p class="MsoNormal">&nbsp; "total": 564,<p></p></p>
<p class="MsoNormal">&nbsp; "start": 1,<p></p></p>
<p class="MsoNormal">&nbsp; "page-length": 10,<p></p></p>
<p class="MsoNormal">&nbsp; "selected": "include",<p></p></p>
<p class="MsoNormal">&nbsp; "results": [<p></p></p>
<p class="MsoNormal">&nbsp;&nbsp;&nbsp; {<p></p></p>
<p class="MsoNormal">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "index": 1,<p></p></p>
<p class="MsoNormal">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "uri": "ENV/CHEM/NANO(2015)22/ANN5/2",<p></p></p>
<p class="MsoNormal">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "path": "fn:doc(\"ENV/CHEM/NANO(2015)22/ANN5/2\")",<p></p></p>
<p class="MsoNormal">(&hellip;)<p></p></p>
<p class="MsoNormal">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "extracted": {<p></p></p>
<p class="MsoNormal">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "kind": "element",<p></p></p>
<p class="MsoNormal">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "content": [<p></p></p>
<p class="MsoNormal">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&lt;language&gt;En&lt;/language&gt;",<p></p></p>
<p class="MsoNormal">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&lt;title&gt;ZINC OXIDE DOSSIERANNEX 5&lt;/title&gt;",<p></p></p>
<p class="MsoNormal">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&lt;reference&gt;ENV/CHEM/NANO(2015)22/ANN5&lt;/reference&gt;",<p></p></p>
<p class="MsoNormal">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&lt;classification&gt;2&lt;/classification&gt;",<p></p></p>
<p class="MsoNormal">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&lt;modificationDate&gt;2015-04-16T00:00:00.000+02:00&lt;/modificationDate&gt;",<p></p></p>
<p class="MsoNormal">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&lt;subject label_en=\"media\"&gt;media&lt;/subject&gt;",<p></p></p>
<p class="MsoNormal">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&lt;subject label_en=\"fish\"&gt;fish&lt;/subject&gt;",<p></p></p>
<p class="MsoNormal">(&hellip;)<p></p></p>
<p class="MsoNormal"><span lang="FR">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>]<p></p></p>
<p class="MsoNormal">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<p></p></p>
<p class="MsoNormal">&nbsp;&nbsp;&nbsp; },<p></p></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
<p class="MsoNormal">Anything I am doing wrong? Is there some configuration options I could tweak to enforce the conversion of xml to json?<p></p></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
<p class="MsoNormal">Cheers,<p></p></p>
<p class="MsoNormal">St&eacute;phane Varin<p></p></p>
</div>
</div>
fhcj.jansen | 22 Jun 17:47 2016

[MarkLogic Dev General] MLCP import transform and search

Hi,

Is it possible in MLCP IMPORT within a TRANSFORM  to search on the (same) database?

To enrich the data (from same database) I use the transform option.
But in below transformation the raw data is loaded in the database with empty elements a,b and c (structure is ok)
Questions:
- is it possible to do a MLCP IMPORT with a search in a transform (xquery) and get data from same (sub-)database
- or do I have to adjust my security role (Which role do I need, currently I use role admin )
- or  .............


MLCP call:

mlcp.sh import \
       -mode local \
       -host localhost \
       -port $PORT \
       -username $USER \
       -password $PASS \
       -input_file_path /home/cent/Desktop/$SRCS \
       -document_type xml  \
       -input_file_type aggregates \
       -aggregate_record_element record \
       -database Dummy \
       -content_encoding "UTF-8" \
       -namespace http://ml.com/$SRCS \
       -output_collections $COLL \
       -transform_module /src/transform/transform.xqy \
       -transform_namespace http://ml.com/mlcp-transform \
       -transform_param $PARM \
       -tolerate_errors true \
       -xml_repair_level full




/src/transform/transform.xqy:

xquery version "1.0-ml";
module namespace ingest = "http://ml.com/mlcp-transform";
declare function ingest:transform(
  $content as map:map,
  $context as map:map
)
{
  let $doc         := map:get($content, "value")

(: Normally the below search function are more specific based on element values in $doc :)
 
  let $a := cts:search(fn:doc(), cts:collection-query("docs"), "unfiltered")
  let $b := cts:uris((),(),cts:collection-query("docs"))
  let $c := fn:collection("docs")[1]
(: removed transformation part               :)
(:   but $a, $b and $c are empty             :)
(:   Same function in QControl gives results :)  

  let $_1 := <new-record>
                      <raw>{$doc}</raw>
                      <a>{$a}</a>
                      <b>{$b}</b>
                      <c>{$c}</c>
                   </new-record>
  let $_2 := map:put($content, "value", $_1)
  return $content
};



Kind Regards,
Frank Jansen

This message has been sent by ABN AMRO Bank N.V., which has its seat at Gustav Mahlerlaan 10 (1082 PP) Amsterdam, the Netherlands, and is registered in the Commercial Register of Amsterdam under number 34334259.
<div>Hi,
<br><br>Is it possible in MLCP IMPORT within
a TRANSFORM &nbsp;to search on the (same) database?
<br><br>To enrich the data (from same database)
I use the transform option.
<br>But in below transformation the raw
data is loaded in the database with empty elements a,b and c (structure
is ok)
<br>Questions:
<br>- is it possible to do a MLCP IMPORT
with a search in a transform (xquery) and get data from same (sub-)database
<br>- or do I have to adjust my security
role (Which role do I need, currently I use role admin ) 
<br>- or &nbsp;.............
<br><br><br>MLCP call:
<br><br>mlcp.sh import \
<br>&nbsp; &nbsp; &nbsp; &nbsp;-mode local
\
<br>&nbsp; &nbsp; &nbsp; &nbsp;-host localhost
\
<br>&nbsp; &nbsp; &nbsp; &nbsp;-port $PORT
\
<br>&nbsp; &nbsp; &nbsp; &nbsp;-username
$USER \
<br>&nbsp; &nbsp; &nbsp; &nbsp;-password
$PASS \
<br>&nbsp; &nbsp; &nbsp; &nbsp;-input_file_path
/home/cent/Desktop/$SRCS \
<br>&nbsp; &nbsp; &nbsp; &nbsp;-document_type
xml &nbsp;\
<br>&nbsp; &nbsp; &nbsp; &nbsp;-input_file_type
aggregates \
<br>&nbsp; &nbsp; &nbsp; &nbsp;-aggregate_record_element
record \
<br>&nbsp; &nbsp; &nbsp; &nbsp;-database
Dummy \
<br>&nbsp; &nbsp; &nbsp; &nbsp;-content_encoding
"UTF-8" \
<br>&nbsp; &nbsp; &nbsp; &nbsp;-namespace
<a href="http://ml.com/%24SRCS">http://ml.com/$SRCS</a>
\
<br>&nbsp; &nbsp; &nbsp; &nbsp;-output_collections
$COLL \
<br>&nbsp; &nbsp; &nbsp; &nbsp;-transform_module
/src/transform/transform.xqy \
<br>&nbsp; &nbsp; &nbsp; &nbsp;-transform_namespace
<a href="http://ml.com/mlcp-transform">http://ml.com/mlcp-transform</a>
\
<br>&nbsp; &nbsp; &nbsp; &nbsp;-transform_param
$PARM \
<br>&nbsp; &nbsp; &nbsp; &nbsp;-tolerate_errors
true \
<br>&nbsp; &nbsp; &nbsp; &nbsp;-xml_repair_level
full
<br><br><br><br><br>/src/transform/transform.xqy:
<br><br>xquery version "1.0-ml";
<br>module namespace ingest = "<a href="http://ml.com/mlcp-transform">http://ml.com/mlcp-transform</a>";
<br>declare function ingest:transform(
<br>&nbsp; $content as map:map,
<br>&nbsp; $context as map:map
<br>) 
<br>{
<br>&nbsp; let $doc &nbsp; &nbsp; &nbsp;
&nbsp; := map:get($content, "value")
<br><br>(: Normally the below search function
are more specific based on element values in $doc :) 
<br>&nbsp; 
<br>&nbsp; let $a := cts:search(fn:doc(),
cts:collection-query("docs"), "unfiltered")
<br>&nbsp; let $b := cts:uris((),(),cts:collection-query("docs"))
<br>&nbsp; let $c := fn:collection("docs")[1]
<br>(: removed transformation part &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; :)
<br>(: &nbsp; but $a, $b and $c are empty
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; :)
<br>(: &nbsp; Same function in QControl
gives results :) &nbsp; 
<br><br>&nbsp; let $_1 := &lt;new-record&gt;
<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;raw&gt;{$doc}&lt;/raw&gt;
<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;a&gt;{$a}&lt;/a&gt;
<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;b&gt;{$b}&lt;/b&gt;
<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;c&gt;{$c}&lt;/c&gt;
<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp;&lt;/new-record&gt;
<br>&nbsp; let $_2 := map:put($content,
"value", $_1)
<br>&nbsp; return $content
<br>};
<br><br><br><br>Kind Regards, <br>
Frank Jansen 
<br><br>This message has been sent by ABN AMRO Bank N.V., which has its seat at Gustav Mahlerlaan 10 (1082 PP) Amsterdam, the Netherlands, and is registered in the Commercial Register of Amsterdam under number 34334259.</div>
Rob Walpole | 22 Jun 12:13 2016
Picon
Gravatar

Re: [MarkLogic Dev General] Validation error when loading pipeline with options via Management API

Hi Indy,

Yes I think it's some kind of namespace issue - however we have now found another solution as I realised we don't need file system access to load the pipeline - it can just as easily be loaded from the modules database which we are populating using mlcp.

Thanks again
Rob


On 22 June 2016 at 10:28, Indrajeet Verma <indrajeet.verma-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

Hi Rob,

I have never tried loading pipeline via APIs however by looking into the error, it seems a namespace error as you can see pipeline is in different namespace and options is with different namespace as per docs.

If possible, could you share your code and I may try to look. If not I am sure some other ML experts can help you on this.

Regards,
Indy

On 22-Jun-2016 2:44 pm, "Rob Walpole" <robkwalpole <at> gmail.com> wrote:
Hi Indy,

Thanks for your reply. Yes we have tried loading the pipeline via the admin UI and it works fine using the format you describe - however we really want to use the management API as the admin UI requires file system access on the server where MarkLogic and this will not always be available to us.

Rob


On 22 June 2016 at 10:05, Indrajeet Verma <indrajeet.verma-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
Hi Rob,

Have you tried to load pipeline via admin UI and seeing same error?

I am using ML8.0-4 and below code works with the options however I have loaded pipeline via UI. 

    <state-transition>
        <annotation>
            When a document is zip, 
        </annotation>
        <state>http://marklogic.com/states/initial</state>
        <on-success>http://marklogic.com/states/transformed</on-success>
        <on-failure>http://marklogic.com/states/error</on-failure>
        <execute>
            <condition>
                <module>/MarkLogic/cpf/actions/mimetype-condition.xqy</module>
                <options xmlns="/MarkLogic/cpf/actions/mimetype-condition.xqy">
                    <mime-type>application/zip</mime-type>
                </options>
            </condition>
            <action>
                <module>action/extract-zip.xqy</module>
            </action>
        </execute>
    </state-transition>

Regards,
Indy

On Wed, Jun 22, 2016 at 2:11 PM, Rob Walpole <robkwalpole-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
Hi,

We are trying to load a CPF pipeline via the RESTful Management API and this fails with an invalid node error when there is an options element present within /pipeline-properties/state-transition/execute/condition

The full error message is as follows:

  <status-code>400</status-code>
  <status>Bad Request</status>
  <message-code>MANAGE-INVALIDPAYLOAD</message-code>
  <message>MANAGE-INVALIDPAYLOAD: (err:FOER0000) Payload has errors in structure, content-type or values. XDMP-VALIDATEUNEXPECTED: (err:XQDY0027) validate strict { $pipeline } -- Invalid node: Found pp:options but expected any(lax,!(http://marklogic.com/manage/pipeline/properties))? at fn:doc("")/pp:pipeline-properties/pp:state-transition/pp:execute/pp:condition/pp:options using schema "manage-pipeline-properties.xsd"</message>
</error>

The problem node looks like this:

<condition>
    <module>/MarkLogic/cpf/actions/mimetype-condition.xqy</module>
        <options>
            <mime-type>application/xml</mime-type>
        </options>
</condition>

We have also tried putting the options element in as..

<options xmlns="/MarkLogic/cpf/actions/mimetype-condition.xqy">
    <mime-type>application/xml</mime-type>
</options>

..which is how it is described in the Content Processing Framework Guide but this fails with a different error as follows:

  <status-code>500</status-code>
  <status>Internal Server Error</status>
  <message-code>XDMP-VALIDATEUNEXPECTED</message-code>
  <message>XDMP-VALIDATEUNEXPECTED: (err:XQDY0027) validate strict { $pipeline } -- Invalid node: Found p:options but expected any(lax,!(http://marklogic.com/cpf/pipelines))? at /p:pipeline/p:state-transition/p:execute/p:condition/p:options using schema "pipelines.xsd"</message>
</error>

If we remove the options element the pipeline loads via the API with no problems.

We are using MarkLogic Server Enterprise Edition 8.0-4.2

Many thanks
Rob Walpole

_______________________________________________
General mailing list
General <at> developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general



_______________________________________________
General mailing list
General <at> developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general



_______________________________________________
General mailing list
General <at> developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general


_______________________________________________
General mailing list
General <at> developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general


<div><div dir="ltr">Hi Indy,<div><br></div>
<div>Yes I think it's some kind of namespace issue - however we have now found another solution as I realised we don't need file system access to load the pipeline - it can just as easily be loaded from the modules database which we are populating using mlcp.</div>
<div><br></div>
<div>Thanks again</div>
<div>Rob</div>
<div><br></div>
<div class="gmail_extra">
<br><div class="gmail_quote">On 22 June 2016 at 10:28, Indrajeet Verma <span dir="ltr">&lt;<a href="mailto:indrajeet.verma@..." target="_blank">indrajeet.verma@...</a>&gt;</span> wrote:<br><blockquote class="gmail_quote">
<p dir="ltr">Hi Rob,</p>
<p dir="ltr">I have never tried loading pipeline via APIs however by looking into the error, it seems a namespace error as you can see pipeline is in different namespace and options is with different namespace as per docs.</p>
<p dir="ltr">If possible, could you share your code and I may try to look. If not I am sure some other ML experts can help you on this.</p>
<p dir="ltr">Regards,<br>
Indy</p>
<div><div>
<div class="gmail_quote">On 22-Jun-2016 2:44 pm, "Rob Walpole" &lt;<a href="mailto:robkwalpole@..." target="_blank">robkwalpole <at> gmail.com</a>&gt; wrote:<br type="attribution"><blockquote class="gmail_quote">
<div dir="ltr">Hi Indy,<div><br></div>
<div>Thanks for your reply. Yes we have tried loading the pipeline via the admin UI and it works fine using the format you describe - however we really want to use the management API as the admin UI requires file system access on the server where MarkLogic and this will not always be available to us.</div>
<div><br></div>
<div>Rob</div>
<div><br></div>
<div class="gmail_extra">
<div><div data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div dir="ltr"></div></div></div></div></div></div>
<br><div class="gmail_quote">On 22 June 2016 at 10:05, Indrajeet Verma <span dir="ltr">&lt;<a href="mailto:indrajeet.verma@..." target="_blank">indrajeet.verma@...</a>&gt;</span> wrote:<br><blockquote class="gmail_quote">
<div dir="ltr">Hi Rob,<div><br></div>
<div>Have you tried to load pipeline via admin UI and seeing same error?</div>
<div><br></div>
<div>I am using ML8.0-4 and below code works with the options however I have loaded pipeline via UI.&nbsp;<br>
</div>
<div>
<div><br></div>
<div>
<div>&nbsp; &nbsp; &lt;state-transition&gt;</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &lt;annotation&gt;</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; When a document is zip,&nbsp;</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &lt;/annotation&gt;</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &lt;state&gt;<a href="http://marklogic.com/states/initial" target="_blank">http://marklogic.com/states/initial</a>&lt;/state&gt;</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &lt;on-success&gt;<a href="http://marklogic.com/states/transformed" target="_blank">http://marklogic.com/states/transformed</a>&lt;/on-success&gt;</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &lt;on-failure&gt;<a href="http://marklogic.com/states/error" target="_blank">http://marklogic.com/states/error</a>&lt;/on-failure&gt;</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &lt;execute&gt;</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;condition&gt;</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;module&gt;/MarkLogic/cpf/actions/mimetype-condition.xqy&lt;/module&gt;</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;options xmlns="/MarkLogic/cpf/actions/mimetype-condition.xqy"&gt;</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;mime-type&gt;application/zip&lt;/mime-type&gt;</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;/options&gt;</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;/condition&gt;</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;action&gt;</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;module&gt;action/extract-zip.xqy&lt;/module&gt;</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;/action&gt;</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &lt;/execute&gt;</div>
<div>&nbsp; &nbsp; &lt;/state-transition&gt;</div>
</div>
</div>
<div><br></div>
<div>Regards,</div>
<div>Indy</div>
</div>
<div class="gmail_extra">
<br><div class="gmail_quote">
<div><div>On Wed, Jun 22, 2016 at 2:11 PM, Rob Walpole <span dir="ltr">&lt;<a href="mailto:robkwalpole@..." target="_blank">robkwalpole@...</a>&gt;</span> wrote:<br>
</div></div>
<blockquote class="gmail_quote">
<div><div>
<div dir="ltr">Hi,<div><br></div>
<div>We are trying to load a CPF pipeline via the RESTful Management API and this fails with an invalid node error when there is an options element present within /pipeline-properties/state-transition/execute/condition</div>
<div><br></div>
<div>The full error message is as follows:</div>
<div><br></div>
<div>
<div>&lt;error xmlns="<a href="http://marklogic.com/xdmp/error" target="_blank">http://marklogic.com/xdmp/error</a>"&gt;</div>
<div>&nbsp; &lt;status-code&gt;400&lt;/status-code&gt;</div>
<div>&nbsp; &lt;status&gt;Bad Request&lt;/status&gt;</div>
<div>&nbsp; &lt;message-code&gt;MANAGE-INVALIDPAYLOAD&lt;/message-code&gt;</div>
<div>&nbsp; &lt;message&gt;MANAGE-INVALIDPAYLOAD: (err:FOER0000) Payload has errors in structure, content-type or values. XDMP-VALIDATEUNEXPECTED: (err:XQDY0027) validate strict { $pipeline } -- Invalid node: Found pp:options but expected any(lax,!(<a href="http://marklogic.com/manage/pipeline/properties)" target="_blank">http://marklogic.com/manage/pipeline/properties)</a>)? at fn:doc("")/pp:pipeline-properties/pp:state-transition/pp:execute/pp:condition/pp:options using schema "manage-pipeline-properties.xsd"&lt;/message&gt;</div>
<div>&lt;/error&gt;</div>
</div>
<div><br></div>
<div>The problem node looks like this:</div>
<div><br></div>
<div>
<div>&lt;condition&gt;</div>
<div>&nbsp; &nbsp; &lt;module&gt;/MarkLogic/cpf/actions/mimetype-condition.xqy&lt;/module&gt;</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &lt;options&gt;</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;mime-type&gt;application/xml&lt;/mime-type&gt;</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &lt;/options&gt;</div>
<div>&lt;/condition&gt;</div>
</div>
<div><br></div>
<div>We have also tried putting the options element in as..</div>
<div><br></div>
<div>
<div>&lt;options xmlns="/MarkLogic/cpf/actions/mimetype-condition.xqy"&gt;</div>
<div>&nbsp; &nbsp; &lt;mime-type&gt;application/xml&lt;/mime-type&gt;</div>
<div>&lt;/options&gt;</div>
</div>
<div><br></div>
<div>..which is how it is described in the Content Processing Framework Guide but this fails with a different error as follows:</div>
<div><br></div>
<div>
<div>&lt;error xmlns="<a href="http://marklogic.com/xdmp/error" target="_blank">http://marklogic.com/xdmp/error</a>"&gt;</div>
<div>&nbsp; &lt;status-code&gt;500&lt;/status-code&gt;</div>
<div>&nbsp; &lt;status&gt;Internal Server Error&lt;/status&gt;</div>
<div>&nbsp; &lt;message-code&gt;XDMP-VALIDATEUNEXPECTED&lt;/message-code&gt;</div>
<div>&nbsp; &lt;message&gt;XDMP-VALIDATEUNEXPECTED: (err:XQDY0027) validate strict { $pipeline } -- Invalid node: Found p:options but expected any(lax,!(<a href="http://marklogic.com/cpf/pipelines)" target="_blank">http://marklogic.com/cpf/pipelines)</a>)? at /p:pipeline/p:state-transition/p:execute/p:condition/p:options using schema "pipelines.xsd"&lt;/message&gt;</div>
<div>&lt;/error&gt;</div>
</div>
<div><br></div>
<div>If we remove the options element the pipeline loads via the API with no problems.</div>
<div><br></div>
<div>We are using MarkLogic Server Enterprise Edition 8.0-4.2</div>
<div><br></div>
<div>Many thanks</div>
<span><div>Rob Walpole</div></span>
</div>
<br>
</div></div>_______________________________________________<br>
General mailing list<br><a href="mailto:General@..." target="_blank">General <at> developer.marklogic.com</a><br>
Manage your subscription at:<br><a href="http://developer.marklogic.com/mailman/listinfo/general" rel="noreferrer" target="_blank">http://developer.marklogic.com/mailman/listinfo/general</a><br><br>
</blockquote>
</div>
<br>
</div>
<br>_______________________________________________<br>
General mailing list<br><a href="mailto:General@..." target="_blank">General <at> developer.marklogic.com</a><br>
Manage your subscription at:<br><a href="http://developer.marklogic.com/mailman/listinfo/general" rel="noreferrer" target="_blank">http://developer.marklogic.com/mailman/listinfo/general</a><br><br>
</blockquote>
</div>
<br>
</div>
</div>
<br>_______________________________________________<br>
General mailing list<br><a href="mailto:General@..." target="_blank">General <at> developer.marklogic.com</a><br>
Manage your subscription at:<br><a href="http://developer.marklogic.com/mailman/listinfo/general" rel="noreferrer" target="_blank">http://developer.marklogic.com/mailman/listinfo/general</a><br><br>
</blockquote>
</div>
</div></div>
<br>_______________________________________________<br>
General mailing list<br><a href="mailto:General@..." target="_blank">General <at> developer.marklogic.com</a><br>
Manage your subscription at:<br><a href="http://developer.marklogic.com/mailman/listinfo/general" rel="noreferrer" target="_blank">http://developer.marklogic.com/mailman/listinfo/general</a><br><br>
</blockquote>
</div>
<br>
</div>
</div></div>
Naoufal Abdenim | 17 Jun 15:50 2016
Picon

[MarkLogic Dev General] remove me please

hi,

Please remove me my compagny don't use marklogic anymore. thank you

best regards
<div><div dir="ltr">hi,<div><br></div>
<div>Please remove me my compagny don't use marklogic anymore. thank you<br><br>best regards</div>
</div></div>
Paul M | 17 Jun 15:15 2016
Picon

[MarkLogic Dev General] running xquery with large result

What would be best alternative to qconsole when running a large xquery? Outside of browser, preferable curl command, or something in a shell.

Thanks so much,
Paul

<div><div>
<div>What would be best alternative to qconsole when running a large xquery? Outside of browser, preferable curl command, or something in a shell.</div>
<div><br></div>
<div>Thanks so much,</div>
<div>Paul<br>
</div>
<div><br></div>
</div></div>
Ron Hitchens | 16 Jun 22:18 2016
Gravatar

[MarkLogic Dev General] Mysterious, Dramatic Query Slowdown on Multi-Node Cluster


   We’re seeing a very odd phenomenon on a client project here in the UK.  Queries (as in read-only, no
updates) slow down dramatically (from 1.2 seconds to 30-40 seconds or longer) while “Jobs” are
running that do relatively light updates.  But only on multi-node clusters (3 node in this case).

Details:
MarkLogic 8.0-3.2
Production (pre-launch): A three node cluster running on Linux in AWS
JVM app nodes (also in AWS) that perform different tasks, taking to the same ML cluster

QA is a single E+D MarkLogic node in AWS

   The operational scenario is this.

o Prod cluster (3 nodes) has about 14+ million documents (articles and books).
o Some number of “API app nodes” which present a REST API dedicated to queries
o Some number of “worker bee” nodes that process batch jobs for ingestion and content enrichment

   The intention is that the worker bees handle the slow, lumpy work of processing and validating content
before ingesting it into ML.  There is a job processing framework that is used on the worker bees to queue,
throttle and process jobs asynchronously.

   The API nodes respond to queries from the web app front end and other clients within the system to do
searches, fetch documents, etc.  These, for the most part, are pure queries that don’t do any updates.

   The issue we’ve bumped up against is this: We have a worker bee job that enriches content by, for a
particular content document (such as an article), taking each associated binary and submitting it to a
thumbnail service.  A thread then polls the service until the results are ready.  Those results are then
written to the content document with URIs of the thumbnail images.

   In the course of processing these jobs, this is what happens (several can run at once, but we see this problem
even with only one running:

   o A job is pulled off the queue.  The queue is just a bunch of job XML documents in ML.
   o The job’s state is updated to running in its XML doc
   o Code starts running in the JVM to process the job
   o During execution, messages can be logged for the job, which results in a node insert to the job XML doc
   o The thumbnail job reads a list of binary references from the content doc
   o For each one it issues a request to an external service, then starts a polling thread to check for good completion
      o There can be up to 10 of these polling threads going at once
      o They are waiting most of the time, not talking to ML
   o Messages can be logged to the job doc in the previous step, but the content doc is not touched
   o When the thumbnail result is ready, then the results are inserted into the content doc in ML
   o The job finishes up and updates the state of the job doc

   There is some lock contention for the job doc from multiple threads logging messages, but it’s not
normally significant.  We see the deadlocks logged by ML at debug level and they seem to resolve within a few
milliseconds as expected and the updates always complete quickly.

   When the results come back and the content doc is updated, there can be contention there as well.  Some jitter
is introduced to prevent the pollers from all waking up at once, but again this shouldn’t matter even of
they do.

   The odd phenomenon is that while one of these jobs is running (spending most of it’s time waiting, about
4-5 seconds between polls) on one of the worker bee nodes, a query sent from one of the API JVM nodes will take
many tens of seconds to complete.  Once the job has finished, then query times will return to normal (a few
milliseconds to 1-2 seconds depending on the specifics of the query).

   So the mystery is this: why would a pure query apparently block for a long time in this scenario.  Queries
should run lock free, so even if there is lock content happening with the thumbnail job, queries should not
be held up.  MarkLogic is not busy at all, nothing else is going on.

   This doesn’t happen on a single node, which makes me suspect something to do with cross-node lock
propagation.  But like I said, logging doesn’t indicate any sort of pathological lock storm or anything
like that.

   If someone can give me some assurance that the latest ML release will solve this problem I’d be happy to
recommend that to the client.  But I’ve reviewed all the documented bug fixes since 8.0-3 and nothing
seems relevant.

   This is a rather urgent problem since all this thumbnail processing must the completed soon without making
the rest of the system unusable.

   Thanks in advance.

---
Ron Hitchens {ron <at> overstory.co.uk}  +44 7879 358212

_______________________________________________
General mailing list
General <at> developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general
Gary Russo | 16 Jun 04:25 2016
Picon
Gravatar

[MarkLogic Dev General] How to measure Read Queries Per Second using Monitoring History Tool

I'd like to measure the read query capacity of an existing cluster using
the Monitoring History Tool.

My objective is to determine the average "Read Queries Per Second" during
a spurt of query activity.

My plan is to use the "Query Read Rate" from the Disk I/O graph.

Here's the info: http://docs.marklogic.com/guide/monitoring/history#id_21175

I ran some JMeter scripts to measure the average request response time which
was .74 seconds.

During this time, I saw the following Disk I/O and Memory I/O metrics.

Disk I/O
Query Read Rate  00.03 MB/sec
Merge Read Rate  28.00 MB/sec

Memory I/O
Page-In  Rate: 22,436.3 pages/sec
Page-Out Rate: 19,345.8 pages/sec

How does this translate to Queries Per Second (QPS)?

The average document (fragment) size is 4 KB (4096 bytes).

The Query Read Rate is .03 MB/sec (30720 bytes/sec)

Since each query returns a 4K document then 30720 divided by 4096 is 7.5
QPS.

If each query returned just a 1024 byte search snippet then it would be 30
QPS.

So I assume my answer QPS rates are:
 -  7.5 QPS when returning documents
 - 30.0 QPS when returning search snippets

Please let me know if I'm missing something.

Gary Russo
NoSQL Architect and Developer
http://garyrusso.wordpress.com
http://twitter.com/garyprusso

Mani, Sivasubramani (ELS | 15 Jun 12:08 2016
Picon

[MarkLogic Dev General] . Re: Regarding XDMP-PREVENTDEADLOCKS (Geert Josten)

Hi Geert,

Now I changed the code like this 

return <a href="{$const:BATCH-LIST-URL}" onClick="return
deleteScheduledBulkBatch('{$batchDocUri}','{$batchDirUri}')">Click here to delete</a>

JS

function deleteScheduledBulkBatch(batchDirUri,batchDocUri) {
	try
	{
		
	}
	catch(err)
	{
		alert(err.message);
	}
	return true;
}

But I don't know how to  call the marklogic  functions like xdmp:document-delete ,
xdmp:directory-delete,doc-available inside the java script. Could you please kindly help me.

Thanks & Regards,
Siva

-----Original Message-----
From: general-bounces@...
[mailto:general-bounces@...] On Behalf Of general-request@...
Sent: Wednesday, June 15, 2016 8:54 AM
To: general@...
Subject: General Digest, Vol 144, Issue 12

Send General mailing list submissions to
	general@...

To subscribe or unsubscribe via the World Wide Web, visit
	http://developer.marklogic.com/mailman/listinfo/general
or, via email, send a message with subject or body 'help' to
	general-request@...

You can reach the person managing the list at
	general-owner@...

When replying, please edit your Subject line so it is more specific than "Re: Contents of General digest..."

Today's Topics:

   1. Re: Regarding XDMP-PREVENTDEADLOCKS (Geert Josten)

----------------------------------------------------------------------

Message: 1
Date: Wed, 15 Jun 2016 07:53:52 +0000
From: Geert Josten <Geert.Josten@...>
Subject: Re: [MarkLogic Dev General] Regarding XDMP-PREVENTDEADLOCKS
To: MarkLogic Developer Discussion <general@...>
Message-ID: <D386D52C.DC911%geert.josten@...>
Content-Type: text/plain; charset="us-ascii"

Hi Siva,

Is this an xqy module inside MarkLogic? It will probably run the xdmp:spawn-function when producing the
HTML, rather than on-click. I think you better use a href that links to the page itself with a request-param
indicating some action is required, and handling that elsewhere in that code, or make the link point to a
different module that will execute the task..

Re preventdeadlocks: you can disable the safeguard with the <prevent-deadlocks> option, see also http://docs.marklogic.com/xdmp:eval..

Cheers,
Geert

From:
<general-bounces@...<mailto:general-bounces@...>>
on behalf of "Mani, Sivasubramani (ELS)" <s.mani@...<mailto:s.mani@...>>
Reply-To: MarkLogic Developer Discussion <general@...<mailto:general@...>>
Date: Wednesday, June 15, 2016 at 9:24 AM
To:
"general@...<mailto:general@...>" <general@...<mailto:general@...>>
Subject: [MarkLogic Dev General] Regarding XDMP-PREVENTDEADLOCKS

Hi team,

I got the XDMP-PREVENTDEADLOCKS error while I try to call a marklogic function inside the Html hreaf
onClick event. Kindly  guide me how to achieve this.

Code:

<dt>Delete</dt>,
                      <dd>{
                            let $userId := auth:get-current-user-id()
                            let $batchId := xs:string(data($batch/ <at> id))
                            let $batchDocUri := batlib:get-batch-xml-uri($userId, $batchId)
                            let $batchDirUri := concat("/repo/user/", $userId,
                            $const:PATH-SEP, $batcon:BATCHES, $const:PATH-SEP, $batchId, $const:PATH-SEP)

                            return <a href="{$const:BATCH-LIST-URL}" onClick="{
                            xdmp:spawn-function(function() {
                            let $deletebBatchDir := try {
                                   xdmp:directory-delete($batchDirUri)
                                    } catch($e){
                                                    ""
                                                }
                            return (if(doc-available($batchDocUri)) then
                            xdmp:document-delete($batchDocUri)
                            else(),xdmp:commit())
                            },<options xmlns="xdmp:eval">
  <transaction-mode>update</transaction-mode>
</options>)

Error Message:

<error:code>XDMP-PREVENTDEADLOCKS</error:code>
  <error:name/>
  <error:xquery-version>1.0-ml</error:xquery-version>
  <error:message>Processing an update from an update with different-transaction isolation could deadlock</error:message>
  <error:format-string>XDMP-PREVENTDEADLOCKS:
xdmp:invoke("/application/views/batch/available-batches.xqy", (fn:QName("", "view-data"),
map:map(&lt;map:map xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:map="http://marklogic.com/xdmp/map"&gt;&lt;map:entry
key="batchType"&gt;&lt;map:value xsi:type="xs:string"&gt;norma...&lt;/map:map&gt;),
fn:QName("", "form-data"), ...), &lt;options
xmlns="xdmp:eval"&gt;&lt;isolation&gt;different-transaction&lt;/isolation&gt;&lt;prevent-deadlocks&gt;t...&lt;/options&gt;)
-- Processing an update from an update with different-transaction isolation could deadlock</error:format-string>
  <error:retryable>false</error:retryable>

Thanks & Regards,
Siva

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://developer.marklogic.com/pipermail/general/attachments/20160615/0133a930/attachment.html 

------------------------------

_______________________________________________
General mailing list
General@...
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

End of General Digest, Vol 144, Issue 12
****************************************
ShamsTabish Sheikh | 14 Jun 13:49 2016
Picon

[MarkLogic Dev General] Avoiding Facets in search:snippet highlight


<!-- .ExternalClass .ecxhmmessage P { padding:0px; } .ExternalClass body.ecxhmmessage { font-size:12pt; font-family:Calibri; } -->
Hello team,
    I'm are trying to display the search keyword matches to the user in our app.
I have used search:snippet to highlight the matches, in some cases facet values are appearing in the results which we don't want, please suggest a way to avoid facet values in search:highlight.

Thanks & Regards,
Tabish.
<div><div dir="ltr">
<br><div>

&lt;!--
.ExternalClass .ecxhmmessage P {
padding:0px;
}

.ExternalClass body.ecxhmmessage {
font-size:12pt;
font-family:Calibri;
}

--&gt;<div dir="ltr">Hello team,<br>&nbsp;&nbsp;&nbsp; I'm are trying to display the search keyword matches to the user in our app. <br>I have used search:snippet to highlight the matches, in some cases facet values are appearing in the results which we don't want, please suggest a way to avoid facet values in search:highlight.<br><br>Thanks &amp; Regards,<br>Tabish.<br>
</div>
</div> 		 	   		  </div></div>

Gmane