Tyagi, Devesh | 27 Jul 15:59 2015

[MarkLogic Dev General] Difference in output of sem:sparql-values and sem:sqarql

Hi,


let $params := map:new(map:entry("predicate",sem:iri("http://www.bsi.org/predicates/hasTaxonomyName")))

sem:sparql-values("select ?subject ?predicate ?object where {?subject ?predicate ?object}", $params)


returns 3 columns namely (subject, predicate, object)


where as 

sem:sparql-values("select ?subject ?predicate ?object where {?subject ?predicate ?object}", $params)


returns only two columns namely (subject, object)


I would like to know why is the change in the behavior, since I am mentioning 3 columns in the select query.


Regards,

Devesh

"This e-mail and any attachments transmitted with it are for the sole use of the intended recipient(s) and may contain confidential , proprietary or privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this e-mail or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful."
<div>
<div>
<p>Hi,</p>
<p><br></p>
<p>let $params := map:new(map:entry("predicate",sem:iri("http://www.bsi.org/predicates/hasTaxonomyName")))</p>
<p>sem:sparql-values("select ?subject ?predicate ?object where {?subject ?predicate ?object}", $params)</p>
<p><br></p>
<p>returns 3 columns namely (subject, predicate, object)</p>
<p><br></p>
<p>where as&nbsp;</p>
<p>sem:sparql-values("select ?subject ?predicate ?object where {?subject ?predicate ?object}", $params)</p>
<p><br></p>
<p>returns only two columns namely (subject, object)</p>
<p><br></p>
<p>I would like to know why is the change in the behavior, since I am mentioning 3 columns in the select query.</p>
<p><br></p>
<p>Regards,</p>
<p>Devesh</p>
</div>
"This e-mail and any attachments transmitted with it are for the sole use of the intended recipient(s) and may contain confidential , proprietary or privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and
 destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this e-mail or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful."
</div>
Tyagi, Devesh | 27 Jul 15:55 2015

[MarkLogic Dev General] Caching of results

Hi,


While profiling a query I noticed that first execution gives the total time of execution. Subsequent queries result in 0 seconds of time in execution. So I am assuming the results are cached for the queries. Can someone point out the source where I can find out more about caching.


Regards,

Devesh

"This e-mail and any attachments transmitted with it are for the sole use of the intended recipient(s) and may contain confidential , proprietary or privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this e-mail or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful."
<div>
<div>
<p>Hi,</p>
<p><br></p>
<p>While profiling a query I noticed that first execution gives the total time of execution. Subsequent queries result in 0 seconds of time in execution. So I am assuming the results are cached for the queries. Can someone point out the source where I can find
 out more about caching.</p>
<p><br></p>
<p>Regards,</p>
<p>Devesh</p>
</div>
"This e-mail and any attachments transmitted with it are for the sole use of the intended recipient(s) and may contain confidential , proprietary or privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and
 destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this e-mail or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful."
</div>
Tyagi, Devesh | 27 Jul 09:55 2015

[MarkLogic Dev General] XQuery evaluation

Hi,


I have the following piece of code in XQuery:


declare variable $triple-predicate-prefix := "http://www.bsi.org/predicates/";

declare variable $triple-predicate-suffix-hasTaxonomyName := "hasTaxonomyName";


declare variable $triple-predicates-map := map:map();

declare variable $initialize-variables-caller := local:initialize-variables();


declare function local:initialize-variables(){

let $_ := map:put($triple-predicates-map,$triple-predicate-suffix-hasTaxonomyName,fn:concat($triple-predicate-prefix,$triple-predicate-suffix-hasTaxonomyName))

return ()

};


declare local:test(){

map:get($triple-predicates-map,$triple-predicate-suffix-hasTaxonomyName)

};


local:test()


This returns me and empty sequence. The map doesn't get initialized. It would be helpful if someone could point me towards the reason and solution behind it.


Thanks and regards,

Devesh

"This e-mail and any attachments transmitted with it are for the sole use of the intended recipient(s) and may contain confidential , proprietary or privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this e-mail or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful."
<div>
<div>
<p>Hi,</p>
<p><br></p>
<p>I have the following piece of code in XQuery:</p>
<p><br></p>
<p>declare variable $triple-predicate-prefix := "http://www.bsi.org/predicates/";</p>
<p>declare variable $triple-predicate-suffix-hasTaxonomyName := "hasTaxonomyName";</p>
<p><br></p>
<p>declare variable $triple-predicates-map := map:map();</p>
<p>declare variable $initialize-variables-caller := local:initialize-variables();</p>
<p><br></p>
<p>declare function local:initialize-variables(){</p>
<p>let $_ := map:put($triple-predicates-map,$triple-predicate-suffix-hasTaxonomyName,fn:concat($triple-predicate-prefix,$triple-predicate-suffix-hasTaxonomyName))</p>
<p>return ()</p>
<p>};</p>
<p><br></p>
<p>declare local:test(){</p>
<p><span class="Apple-tab-span"></span>map:get($triple-predicates-map,$triple-predicate-suffix-hasTaxonomyName)</p>
<p>};</p>
<p><br></p>
<p>local:test()</p>
<p><br></p>
<p>This returns me and empty sequence. The map doesn't get initialized. It would be helpful if someone could point me towards the reason and solution behind it.</p>
<p><br></p>
<p>Thanks and regards,</p>
<p>Devesh</p>
</div>
"This e-mail and any attachments transmitted with it are for the sole use of the intended recipient(s) and may contain confidential , proprietary or privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and
 destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this e-mail or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful."
</div>
Indrajeet Verma | 23 Jul 15:57 2015
Picon

[MarkLogic Dev General] Regarding Excluding a node from search

Hi All,

I am doing free text search by using search libraries. I am using search:search/search:resolve etc.. We have elements <footnote> inside documents and I don't want to include these nodes in my search.

e.g. 

If I search Effective , this should not return result.

If I search confidential , this should return result however match count should be only 2 NOT 3. Also confidential  that is under footnote should not be highlighted.

<doc>
<meta>
</meta>
<footnote id="R09-IDANDNQ-R09-IDAE00T-FN7" content-type="secondary">
        <para>
            <text>Effective February 3, 2015, confidential  </text>
       </para>
</footnote>
<num>(a)</num>
    <para>
        <text>An SCI entity (as defined in 
            of this chapter) shall not omit the confidential portion from the material filed in electronic format on Form SCI pursuant to Regulation SCI,           
            <italic>et seq</italic>., and, in lieu of the procedures described in 
             of this section, may request confidential treatment of all information provided on Form SCI by completing Section IV of Form SCI.
        </text>
    </para>
</doc>


Is there any way to exclude elements from the search? 

Regards,
Indy
<div><div dir="ltr">Hi All,<div><br></div>
<div>I am doing free text search by using search libraries. I am using search:search/search:resolve etc.. We have elements &lt;footnote&gt; inside documents and I don't want to include these nodes in my search.</div>
<div><br></div>
<div>e.g.&nbsp;</div>
<div><br></div>
<div>If I search&nbsp;Effective&nbsp;, this should not return result.</div>
<div><br></div>
<div>If I search confidential , this should return result however match count should be only 2 NOT 3. Also confidential &nbsp;that is under footnote should not be highlighted.</div>
<div><br></div>
<div>&lt;doc&gt;</div>
<div>&lt;meta&gt;</div>
<div>&lt;/meta&gt;</div>
<div>
<div>&lt;footnote id="R09-IDANDNQ-R09-IDAE00T-FN7" content-type="secondary"&gt;</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &lt;para&gt;</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;text&gt;Effective February 3, 2015,&nbsp;confidential &nbsp;&lt;/text&gt;</div>
</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp;&lt;/para&gt;</div>
<div>&lt;/footnote&gt;<br>
</div>
<div>
<div>&lt;num&gt;(a)&lt;/num&gt;</div>
<div>&nbsp; &nbsp; &lt;para&gt;</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &lt;text&gt;An SCI entity (as defined in&nbsp;</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; of this chapter) shall not omit the confidential portion from the material filed in electronic format on Form SCI pursuant to Regulation SCI, &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;italic&gt;et seq&lt;/italic&gt;., and, in lieu of the procedures described in&nbsp;</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;of this section, may request confidential treatment of all information provided on Form SCI by completing Section IV of Form SCI.</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &lt;/text&gt;</div>
<div>&nbsp; &nbsp; &lt;/para&gt;</div>
</div>
<div>&lt;/doc&gt;</div>
<div><br></div>
<div><br></div>
<div>Is there any way to exclude elements from the search?&nbsp;<br>
</div>
<div><br></div>
<div>Regards,</div>
<div>Indy</div>
</div></div>
Andreas Hubmer | 23 Jul 14:31 2015

[MarkLogic Dev General] Query Result Downloading (QConsole)

Hello,

Is there a possibility to download the result of a query instead of displaying it in the Query Console? Sometimes the result is just too big.

Best regards,
Andreas

--
Andreas Hubmer
IT Consultant


OUR TEAM IS YOUR SUCCESS

<div><div dir="ltr">
<div>
<div>
<div>Hello,<br><br>
</div>Is there a possibility to download the result of a query instead of displaying it in the Query Console? Sometimes the result is just too big.<br><br>
</div>Best regards,<br>
</div>Andreas<br clear="all"><div><div><div><div>
<br>-- <br><div class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div dir="ltr">
<div dir="ltr">
<span>Andreas Hubmer</span><br>
</div>
<div>IT Consultant</div>
<div dir="ltr"><br></div>
<div dir="ltr">Web: <a href="http://www.ebcont.com" target="_blank">http://www.ebcont.com</a>
</div>
<div dir="ltr"><br></div>
<div dir="ltr">OUR TEAM IS YOUR SUCCESS</div>
<br>
</div></div></div></div></div>
</div></div></div></div>
</div></div>
Tim | 18 Jul 18:49 2015
Picon

[MarkLogic Dev General] Where can I download the latest MarkLogic 7 XCC libraries for Java and .Net?

All I can find are the ML 8 libraries.

 

Tim Meagher

 

<div><div class="WordSection1">
<p class="MsoNormal"><span>All I can find are the ML 8 libraries.<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>Tim Meagher<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
</div></div>
cyanline llc | 17 Jul 22:02 2015

[MarkLogic Dev General] test

test
cyanline llc | 17 Jul 20:49 2015

[MarkLogic Dev General] learn to download ingested file

Hi all,
   What resources can I use to learn how to allow the user interface to 
download a file that was ingested into ML?

   I am breaking apart an email archive, storing the metadata in XML 
format, and copying attachments to the email messages into a directory. 
Then, it is all ingested into ML.

   On my rest-app results I set the attachment file name to display, and 
now I am looking to make it a link to serve the user the attachment file.

   Below is an example:

[message-65] ls -R 18:41:45
.:
attachments  message-65.xml

./attachments:
fresh~1.pdf

   I'm using ML7 and I searched markmail developer list for similar 
queries, to no avail.

Thank you
Danny Sinang | 17 Jul 19:51 2015
Picon

[MarkLogic Dev General] Displaying document after an update

I need to display a document right after a call to xdmp:node-replace() is made, and I'm able to achieve this by using a semi-colon to place them in separate transactions.

                            xdmp:node-replace($environment, $new-environment);


                            fn:doc("/release-tracking/components/2.xml")

Problem is, only the first transaction has enough data to derive the URI of the document to be displayed.  

Thus my hardcoding the path in the second transaction.

Is there a way to pass this URI from the first to the second transaction ?

Or is there another / better way of displaying the doc after the update ?

Regards,

Danny




<div><div dir="ltr">I need to display a document right after a call to xdmp:node-replace() is made, and I'm able to achieve this by using a semi-colon to place them in separate transactions.<div>

<p class="">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;<span class="">xdmp:node-replace</span>(<span class="">$environment</span>, <span class="">$new-environment</span>);</p>
<p class=""><br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="">fn:doc</span>(<span class="">"/release-tracking/components/2.xml"</span>)</p>
<p class="">Problem is, only the first transaction has enough data to derive the URI of the document to be displayed. &nbsp;</p>
<p class="">Thus my hardcoding the path in the second transaction.<br></p>
<p class="">Is there a way to pass this URI from the first to the second transaction ?</p>
<p class="">Or is there another / better way of displaying the doc after the update ?</p>
<p class="">Regards,</p>
<p class="">Danny</p>
<p class=""><br></p>
<p class=""><br></p>
<p class=""><br></p>
</div>
</div></div>
Dan Meyers | 15 Jul 15:37 2015
Picon
Picon

[MarkLogic Dev General] Is a cts:query looking at specific attributes only within the same element possible?

I feel sure it should be, and I’m just failing to make proper use of the system, but I’d appreciate more eyes on this.

Consider the following set of (contrived, simple) documents:

Doc 1:
<test uuid="X" >
<relationships>
<relationship id="one" />
<relationship id="two" inherited="true" />
</relationships>
</test>

Doc 2:
<test uuid="Y">
<relationships>
<relationship id="one" inherited="false" />
<relationship id="three" />
</relationships>
</test>

Doc 3:
<test uuid="Z">
<relationships>
<relationship id="one" inherited="true" />
<relationship id="four" />
</relationships>
</test>

I want to be able to return all those documents which contain at least one relationship with id equal to “one” and inherited not equal to “true” (i.e. false or not present). So in the above example I’d expect to be returned docs 1 and 2. In the real world I’ll be searching across over 100 million documents, so I need to be able to do this via indexes and xquery not looping over all available documents and examining their content.

With a cts:and-not-query looking for the presence of id as “one” and not having inherited as “true” I only get returned doc2. As I understand it, this is because the and-not-query is matching inherited equal to “true” in the second relationship of doc1 and discarding the entire doc, even though that relationship is not for the desired id.

Is there any kind of query (with any necessary indexes) I can construct that will do what I want and only pay attention to the inherited field when it is within the same relationship element as id?

An alternative we have considered is to create 2 different element types within the relationships parent, so that we have relationship, and relationship_direct (or whatever), and the query that doesn’t want to count inherited relationships looks only at relationship_direct elements, but that seems like a hacky method of doing this. We’ve also considered separate documents, but because of all the other data held within them and not shown in this basic example that would be a massive headache. If it would help, we could ensure the inherited attribute was always present, and set to true or false as necessary, rather than normally being either true or not present.

How would other people go about doing this? Any ideas would be great. In case it matters, and there’s some new MarkLogic 8 function that does this, we’re currently on MarkLogic 7 in our live environment. We will be upgrading eventually, but not soon enough for us to be able to wait for that before we make this query work.

Thanks

Dan

 

----------------------------

http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

---------------------

<div>
<p class="4aeb0e0f-c67f-42ce-bc35-6d175fcbd3cd"></p>
<div>I feel sure it should be, and I&rsquo;m just failing to make proper use of the system, but I&rsquo;d appreciate more eyes on this.</div>
<div><br></div>
<div>Consider the following set of (contrived, simple) documents:</div>
<div><br></div>
<div>Doc 1:</div>
<div>&lt;test uuid="X" &gt;</div>
<div>
<span class="Apple-tab-span"></span>&lt;relationships&gt;</div>
<div>
<span class="Apple-tab-span"></span>&lt;relationship id="one" /&gt;</div>
<div>
<span class="Apple-tab-span"></span>&lt;relationship id="two" inherited="true" /&gt;</div>
<div>
<span class="Apple-tab-span"></span>&lt;/relationships&gt;</div>
<div>&lt;/test&gt;</div>
<div><br></div>
<div>
<div>Doc 2:</div>
<div>&lt;test uuid="Y"&gt;</div>
<div>
<span class="Apple-tab-span"></span>&lt;relationships&gt;</div>
<div>
<span class="Apple-tab-span"></span>&lt;relationship id="one" inherited="false" /&gt;</div>
<div>
<span class="Apple-tab-span"></span>&lt;relationship id="three" /&gt;</div>
<div>
<span class="Apple-tab-span"></span>&lt;/relationships&gt;</div>
<div>&lt;/test&gt;</div>
</div>
<div><br></div>
<div>
<div>Doc 3:</div>
<div>&lt;test uuid="Z"&gt;</div>
<div>
<span class="Apple-tab-span"></span>&lt;relationships&gt;</div>
<div>
<span class="Apple-tab-span"></span>&lt;relationship id="one" inherited="true" /&gt;</div>
<div>
<span class="Apple-tab-span"></span>&lt;relationship id="four" /&gt;</div>
<div>
<span class="Apple-tab-span"></span>&lt;/relationships&gt;</div>
<div>&lt;/test&gt;</div>
</div>
<div><br></div>
<div>I want to be able to return all those documents which contain at least one relationship with id equal to &ldquo;one&rdquo; and inherited not equal to &ldquo;true&rdquo; (i.e. false or not present). So in the above example I&rsquo;d expect to be returned docs 1 and 2. In the real world
 I&rsquo;ll be searching across over 100 million documents, so I need to be able to do this via indexes and xquery not looping over all available documents and examining their content.</div>
<div><br></div>
<div>With a cts:and-not-query looking for the presence of id as &ldquo;one&rdquo; and not having inherited as &ldquo;true&rdquo; I only get returned doc2. As I understand it, this is because the and-not-query is matching inherited equal to &ldquo;true&rdquo; in the second relationship of doc1
 and discarding the entire doc, even though that relationship is not for the desired id.</div>
<div><br></div>
<div>Is there any kind of query (with any necessary indexes) I can construct that will do what I want and only pay attention to the inherited field when it is within the same relationship element as id?</div>
<div><br></div>
<div>An alternative we have considered is to create 2 different element types within the relationships parent, so that we have relationship, and relationship_direct (or whatever), and the query that doesn&rsquo;t want to count inherited relationships looks only at
 relationship_direct elements, but that seems like a hacky method of doing this. We&rsquo;ve also considered separate documents, but because of all the other data held within them and not shown in this basic example that would be a massive headache. If it would help,
 we could ensure the inherited attribute was always present, and set to true or false as necessary, rather than normally being either true or not present.</div>
<div><br></div>
<div>How would other people go about doing this? Any ideas would be great. In case it matters, and there&rsquo;s some new MarkLogic 8 function that does this, we&rsquo;re currently on MarkLogic 7 in our live environment. We will be upgrading eventually, but not soon enough
 for us to be able to wait for that before we make this query work.</div>
<div><br></div>
<div>Thanks</div>
<div><br></div>
<div>Dan</div>
<p></p>
<p class="4aeb0e0f-c67f-42ce-bc35-6d175fcbd3cd">&nbsp;</p>
<p class="4aeb0e0f-c67f-42ce-bc35-6d175fcbd3cd">----------------------------<br><br><a href="http://www.bbc.co.uk" target="_blank">http://www.<span class="il">bbc</span>.<span class="il">co</span>.<span class="il">uk</span></a><br>
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the
<span class="il">BBC</span> unless specifically stated.<br>
If you have received it in error, please delete it from your system.<br>
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.<br>
Please note that the <span class="il">BBC</span> monitors e-mails sent or received.<br>
Further communication will signify your consent to this.</p>
<p class="4aeb0e0f-c67f-42ce-bc35-6d175fcbd3cd">---------------------</p>
</div>
Morales-Martin, Kristina | 13 Jul 17:45 2015

Re: [MarkLogic Dev General] mlcp.sh help with filtering to ingest only XML files in zip files

Addendum:

We actually send this regular expression, to escape the dot, yet mlcp.sh import still does not filter our desired files

 

-input_file_pattern '.*\.xml'

 

From: Morales-Martin, Kristina
Sent: Monday, July 13, 2015 11:43 AM
To: 'general-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org'
Subject: mlcp.sh help with filtering to ingest only XML files in zip files

 

Dear all,

 

We need help in ingesting a directory of many* zip files, each with many* XML files.

 

We are using the mlcp (Mark Logic Content Pump) out of the box to import content as-is from a directory of zip files.

 

In particular, we are using these options:

-mode local \

-input_file_path [a directory that has zip files, each zip file has a heterogenous mix of .xml and other binary files] \

-input_compressed true \

-input_file_pattern '.*.xml' \

-output_uri_replace "(\/.+\/+)(?=.+\.zip),'/ourOverrideOfTheURIToRemoveTheLeadingNASPath/'" \

 

Can anyone help with the –input_file_pattern option?  Our intent is to only load the .xml files in the zip files in the directory.

We do not want to load other files.  For some reason, the –input_file_pattern is not successfully filtering.

If you have encountered this non-filtering behavior, what have you done to make it work?

 

Thank you,

Kristina Morales-Martin
Sr. Technical Information Specialist, e-Content Operations

CAS, a division of the American Chemical Society
2540 Olentangy River Road
Columbus, OH 43202
Phone: 614-447-3600, ext. 2322
Fax: 614-447-3827
www.cas.org

 

Confidentiality Notice: This electronic message transmission, including any attachment(s), may contain confidential, proprietary, or privileged information from Chemical Abstracts Service (“CAS”), a division of the American Chemical Society (“ACS”). If you have received this transmission in error, be advised that any disclosure, copying, distribution, or use of the contents of this information is strictly prohibited. Please destroy all copies of the message and contact the sender immediately by either replying to this message or calling 614-447-3600.

<div>
<div class="WordSection1">
<p class="MsoNormal"><span>Addendum:<p></p></span></p>
<p class="MsoNormal"><span>We actually send this regular expression, to escape the dot, yet mlcp.sh import still does not filter our desired files<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal">-input_file_pattern '.*<span>\</span>.xml'
<p></p></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<div>
<div>
<p class="MsoNormal"><span>From:</span><span> Morales-Martin, Kristina
<br>Sent: Monday, July 13, 2015 11:43 AM<br>To: 'general@...'<br>Subject: mlcp.sh help with filtering to ingest only XML files in zip files<p></p></span></p>
</div>
</div>
<p class="MsoNormal"><p>&nbsp;</p></p>
<p class="MsoNormal">Dear all,<p></p></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
<p class="MsoNormal">We need help in ingesting a directory of many* zip files, each with many* XML files.<p></p></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
<p class="MsoNormal">We are using the mlcp (Mark Logic Content Pump) out of the box to import content as-is from a directory of zip files.<p></p></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
<p class="MsoNormal">In particular, we are using these options:<p></p></p>
<p class="MsoNormal">-mode local \<p></p></p>
<p class="MsoNormal">-input_file_path [a directory that has zip files, each zip file has a heterogenous mix of .xml and other binary files] \<p></p></p>
<p class="MsoNormal">-input_compressed true \<p></p></p>
<p class="MsoNormal">-input_file_pattern '.*.xml' \<p></p></p>
<p class="MsoNormal">-output_uri_replace "(\/.+\/+)(?=.+\.zip),'/ourOverrideOfTheURIToRemoveTheLeadingNASPath/'" \<p></p></p>
<p class="MsoNormal">&hellip;<p></p></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
<p class="MsoNormal">Can anyone help with the &ndash;input_file_pattern option?&nbsp; Our intent is to only load the .xml files in the zip files in the directory.<p></p></p>
<p class="MsoNormal">We do not want to load other files.&nbsp; For some reason, the &ndash;input_file_pattern is not successfully filtering.<p></p></p>
<p class="MsoNormal">If you have encountered this non-filtering behavior, what have you done to make it work?<p></p></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
<div>
<p class="MsoNormal">Thank you,<span><p></p></span></p>
</div>
<p class="MsoNormal"><span>Kristina Morales-Martin</span><span><br>
Sr. Technical Information Specialist, e-Content Operations <p></p></span></p>
<p class="MsoNormal"><span>CAS, a division of the American Chemical Society<br>
2540 Olentangy River Road<br>
Columbus, OH 43202<br>
Phone: 614-447-3600, ext. 2322<br>
Fax: 614-447-3827<br><a href="http://www.cas.org/" target="_blank"><span>www.cas.org</span></a></span><p></p></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
</div>

<p> 
Confidentiality Notice: This electronic message transmission, including any attachment(s), may contain confidential, proprietary, or privileged information from Chemical Abstracts Service (&ldquo;CAS&rdquo;), a division of the American Chemical Society (&ldquo;ACS&rdquo;). If you have received this transmission in error, be advised that any disclosure, copying, distribution, or use of the contents of this information is strictly prohibited. Please destroy all copies of the message and contact the sender immediately by either replying to this message or calling 614-447-3600.</p>

</div>

Gmane