Gary Russo | 20 Oct 16:43 2014
Picon

Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003?

Hello Ron,

 

Yes, it is feasible to do the metadata extraction upstream of MarkLogic.

 

It complicates things a little bit but it will be ok.

 

Apache Tika looks like a nice solution.

 

My client is a Microsoft shop and they use a product called Aspose to convert/extract data from spreadsheets.

 

The majority of spreadsheet formats that I need to ingest use the older 97/2003 format. I can use the Aspose API to covert the older format to OOXML on the fly.

 

It’s unfortunate that the MarkLogic xdmp:document-filter() API is not able to extract the “defined name” metadata from the “97/2003” file format.

 

I consider it to be a bug in the MarkLogic API because other Excel Spreadsheet extraction APIs (e.g., Aspose, Tika, Apache POI) can extract this data from the older file format.

 

Anyway, thanks for the info.

 

-          Gary R

 

 

 

From: general-bounces-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org [mailto:general-bounces-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org] On Behalf Of Ron Hitchens
Sent: Friday, October 17, 2014 11:52 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003?

 

 

   If it's feasible to do your metadata extraction upstream of MarkLogic (i.e., before insertion) you might take a look at Apache Tika.  It's designed for this sort of thing.

 

   You could also setup it up in a simple web service callable from MarkLogic.  POST the spreadsheet to it and have it return the metadata in whatever form you like.


---

Ron Hitchens {ron-ECtqR1qVIOE7VdE/fOJbLw@public.gmane.org +44 7879 358212

 

On Oct 17, 2014, at 3:35 PM, Gary Russo <garyrusso-PkbjNfxxIARBDgjK7y7TUQ@public.gmane.org> wrote:



Hello Dennis,

 

Thanks for the info.

 

Yes, I tried xdmp:excel-convert() but this does not get the worksheet metadata either.

 

The metadata that I need to retrieve from the older excel format is the “Named Fields”.

 

Users create them using the Excel “Named Box” feature as shown here. => http://spreadsheets.about.com/od/exceltips/qt/81225namebox.htm

 

It looks like my only option is to use the Apache POI Java API to extract the named fields or use it to convert xls-to-xlsx on-the-fly. =>https://poi.apache.org/apidocs

 

I know there’s a hidden way to use MarkLogic’s underlying JVM.

 

It would be great if I could use it to call the Apache POI code.

 

But that’s a question for another day.

 

Thanks again,

 

Gary Russo

 

 

Gary Russo

Enterprise NoSQL Developer

 

 

 

From: general-bounces-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org [mailto:general-bounces-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org] On Behalf Of David Ennis
Sent: Thursday, October 16, 2014 5:02 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003?

 

HI.

 

I believe that with the conversion licence, you can do what you want with: xdmp:excel-convert

 

Barring that, you could always run openoffice as a headless server for conversion purposes.

 

Kind Regards,

David Ennis

 

 


 

 

Kind Regards,

David Ennis

 

 

David Ennis
Content Engineer

 
Mastering the value of content
creative | technology | content

Delftechpark 37i
2628 XJ Delft
The Netherlands
T: +31 88 268 25 00
M: +31 63 091 72 80 

   

 

On 16 October 2014 20:00, Gary Russo <garyrusso-PkbjNfxxIARBDgjK7y7TUQ@public.gmane.org> wrote:

I need to extract worksheet metadata called “defined name” from Excel 97/2003 formatted spreadsheets.

 

The ISYS xdmp:document-filter() API is limiting because it only extracts the text.

 

It does not extract any worksheet metadata.

 

Does anyone know of a workaround for this?

 

My only thought is to upload the “Excel 97/2003” xls file and then convert it on the server to an “Excel 2010” xlsx format.

 

Once it’s in an Excel 2010 format, I can easily extract the “defined name” metadata.

 

This is what it looks like in “Excel 2010” files.

 

  <definedNames>
    <definedName name="LastYr">Revenue!$B$6:$B$15</definedName>
    <definedName name="ThisYr">Revenue!$C$6:$C$15</definedName>
    <definedName name="Variance">Revenue!$D$6:$D$15</definedName>
  </definedNames>

 

 

Thanks,

Gary Russo

 

 

Gary Russo

Enterprise NoSQL Developer

Phone: 212-404-8639

Skype: garyprusso

 


_______________________________________________
General mailing list
General-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org
http://developer.marklogic.com/mailman/listinfo/general

 

_______________________________________________
General mailing list
General-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org
http://developer.marklogic.com/mailman/listinfo/general

 

<div><div class="WordSection1">
<p class="MsoNormal"><span>Hello Ron,<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>Yes, it is feasible to do the metadata extraction upstream of MarkLogic.<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>It complicates things a little bit but it will be ok.<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>Apache Tika looks like a nice solution.<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>My client is a Microsoft shop and they use a product called Aspose to convert/extract data from spreadsheets.<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>The majority of spreadsheet formats that I need to ingest use the older 97/2003 format. I can use the Aspose API to covert the older format to OOXML on the fly.<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>It&rsquo;s unfortunate that the MarkLogic xdmp:document-filter() API is not able to extract the &ldquo;defined name&rdquo; metadata from the &ldquo;97/2003&rdquo; file format.<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>I consider it to be a bug in the MarkLogic API because other Excel Spreadsheet extraction APIs (e.g., Aspose, Tika, Apache POI) can extract this data from the older file format. <p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>Anyway, thanks for the info.<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoListParagraph"><span><span>-<span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span></span><span>Gary R<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<div><div><p class="MsoNormal"><span>From:</span><span> general-bounces@... [mailto:general-bounces@...] On Behalf Of Ron Hitchens<br>Sent: Friday, October 17, 2014 11:52 AM<br>To: MarkLogic Developer Discussion<br>Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003?<p></p></span></p></div></div>
<p class="MsoNormal"><p>&nbsp;</p></p>
<div><p class="MsoNormal"><p>&nbsp;</p></p></div>
<p class="MsoNormal">&nbsp; &nbsp;If it's feasible to do your metadata extraction upstream of MarkLogic (i.e., before insertion) you might take a look at Apache Tika. &nbsp;It's designed for this sort of thing.<p></p></p>
<div><p class="MsoNormal"><p>&nbsp;</p></p></div>
<div>
<p class="MsoNormal">&nbsp; &nbsp;You could also setup it up in a simple web service callable from MarkLogic. &nbsp;POST the spreadsheet to it and have it return the metadata in whatever form you like.<p></p></p>
<div><div><div>
<div><p class="MsoNormal"><span><br>---<p></p></span></p></div>
<div><p class="MsoNormal"><span>Ron Hitchens {<a href="mailto:ron@...">ron@...</a>}&nbsp;<span class="apple-converted-space">&nbsp;</span>+44 7879 358212<p></p></span></p></div>
</div></div></div>
<p class="MsoNormal"><p>&nbsp;</p></p>
<div>
<div><p class="MsoNormal">On Oct 17, 2014, at 3:35 PM, Gary Russo &lt;<a href="mailto:garyrusso@...">garyrusso@...</a>&gt; wrote:<p></p></p></div>
<p class="MsoNormal"><br><br><p></p></p>
<div>
<div><p class="MsoNormal"><span>Hello Dennis,</span><p></p></p></div>
<div><p class="MsoNormal"><span>&nbsp;</span><p></p></p></div>
<div><p class="MsoNormal"><span>Thanks for the info.</span><p></p></p></div>
<div><p class="MsoNormal"><span>&nbsp;</span><p></p></p></div>
<div><p class="MsoNormal"><span>Yes, I tried xdmp:excel-convert() but this does not get the worksheet metadata either.</span><p></p></p></div>
<div><p class="MsoNormal"><span>&nbsp;</span><p></p></p></div>
<div><p class="MsoNormal"><span>The metadata that I need to retrieve from the older excel format is the &ldquo;Named Fields&rdquo;.</span><p></p></p></div>
<div><p class="MsoNormal"><span>&nbsp;</span><p></p></p></div>
<div><p class="MsoNormal"><span>Users create them using the Excel &ldquo;Named Box&rdquo; feature as shown here. =&gt;<span class="apple-converted-space">&nbsp;</span><a href="http://spreadsheets.about.com/od/exceltips/qt/81225namebox.htm"><span>http://spreadsheets.about.com/od/exceltips/qt/81225namebox.htm</span></a></span><p></p></p></div>
<div><p class="MsoNormal"><span>&nbsp;</span><p></p></p></div>
<div><p class="MsoNormal"><span>It looks like my only option is to use the Apache POI Java API to extract the named fields or use it to convert xls-to-xlsx on-the-fly. =&gt;<a href="https://poi.apache.org/apidocs"><span>https://poi.apache.org/apidocs</span></a></span><p></p></p></div>
<div><p class="MsoNormal"><span>&nbsp;</span><p></p></p></div>
<div><p class="MsoNormal"><span>I know there&rsquo;s a hidden way to use MarkLogic&rsquo;s underlying JVM.</span><p></p></p></div>
<div><p class="MsoNormal"><span>&nbsp;</span><p></p></p></div>
<div><p class="MsoNormal"><span>It would be great if I could use it to call the Apache POI code.</span><p></p></p></div>
<div><p class="MsoNormal"><span>&nbsp;</span><p></p></p></div>
<div><p class="MsoNormal"><span>But that&rsquo;s a question for another day.</span><p></p></p></div>
<div><p class="MsoNormal"><span>&nbsp;</span><p></p></p></div>
<div><p class="MsoNormal"><span>Thanks again,</span><p></p></p></div>
<div><p class="MsoNormal"><span>&nbsp;</span><p></p></p></div>
<div><p class="MsoNormal"><span>Gary Russo</span><p></p></p></div>
<div><p class="MsoNormal"><span>&nbsp;</span><p></p></p></div>
<div><p class="MsoNormal"><span>&nbsp;</span><p></p></p></div>
<div><p class="MsoNormal"><span>Gary Russo</span><p></p></p></div>
<div><p class="MsoNormal"><span>Enterprise NoSQL Developer</span><p></p></p></div>
<div><p class="MsoNormal"><span><a href="http://garyrusso.wordpress.com">http://garyrusso.wordpress.com</a></span><p></p></p></div>
<div><p class="MsoNormal"><span><a href="http://twitter.com/garyprusso"><span>http://twitter.com/garyprusso</span></a></span><p></p></p></div>
<div><p class="MsoNormal"><span>&nbsp;</span><p></p></p></div>
<div><p class="MsoNormal"><span>&nbsp;</span><p></p></p></div>
<div><p class="MsoNormal"><span>&nbsp;</span><p></p></p></div>
<div><p class="MsoNormal"><span>From:</span><span class="apple-converted-space"><span>&nbsp;</span></span><span><a href="mailto:general-bounces@..."><span>general-bounces@...</span></a><span class="apple-converted-space">&nbsp;</span>[mailto:general-<a href="mailto:bounces@..."><span>bounces@...</span></a>]<span class="apple-converted-space">&nbsp;</span>On Behalf Of<span class="apple-converted-space">&nbsp;</span>David Ennis<br>Sent:<span class="apple-converted-space">&nbsp;</span>Thursday, October 16, 2014 5:02 PM<br>To:<span class="apple-converted-space">&nbsp;</span>MarkLogic Developer Discussion<br>Subject:<span class="apple-converted-space">&nbsp;</span>Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003?</span><p></p></p></div>
<div><p class="MsoNormal">&nbsp;<p></p></p></div>
<div>
<div><div><p class="MsoNormal"><span>HI.</span><p></p></p></div></div>
<div><div><p class="MsoNormal"><span>&nbsp;</span><p></p></p></div></div>
<div><div><p class="MsoNormal"><span>I believe that with the conversion licence, you can do what you want with:&nbsp;xdmp:excel-convert</span><p></p></p></div></div>
<div><div><p class="MsoNormal"><span>&nbsp;</span><p></p></p></div></div>
<div><div><p class="MsoNormal"><span>Barring that, you could always run openoffice as a headless server for conversion purposes.</span><p></p></p></div></div>
<div><div><p class="MsoNormal"><span>&nbsp;</span><p></p></p></div></div>
<div><div><p class="MsoNormal"><span>Kind Regards,</span><p></p></p></div></div>
<div><div><p class="MsoNormal"><span>David Ennis</span><p></p></p></div></div>
<div><div><p class="MsoNormal"><span>&nbsp;</span><p></p></p></div></div>
<div><div><p class="MsoNormal"><span>&nbsp;</span><p></p></p></div></div>
</div>
<div>
<div><p class="MsoNormal"><br clear="all"><p></p></p></div>
<div>
<div><div><p class="MsoNormal">&nbsp;<p></p></p></div></div>
<div><div><p class="MsoNormal">&nbsp;<p></p></p></div></div>
<div><div><p class="MsoNormal"><span>Kind Regards,</span><p></p></p></div></div>
<div><div><p class="MsoNormal"><span>David Ennis</span><p></p></p></div></div>
<div><div><p class="MsoNormal">&nbsp;<p></p></p></div></div>
<div><div><p class="MsoNormal">&nbsp;<p></p></p></div></div>
<div><p class="MsoNormal"><span>David Ennis</span><span><br></span><span>Content Engineer</span><span><br><br></span><a href="http://www.hinttech.com/" target="_blank"><span>&nbsp;</span></a><span><br></span><span>Mastering the value of content</span><span><br></span><span>creative | technology | content</span><span><br><br></span><span>Delftechpark 37i<br>2628 XJ Delft<br>The Netherlands<br></span><span>T:</span><span>&nbsp;+31 88 268 25 00<br></span><span>M:</span><span>&nbsp;+31 63 091 72 80&nbsp;</span><span><br><br></span><a href="http://www.hinttech.com" target="_blank"><span></span></a><span>&nbsp;</span><a href="https://twitter.com/HintTech" target="_blank"><span></span></a><span>&nbsp;</span><a href="http://www.facebook.com/HintTech" target="_blank"><span></span></a><span>&nbsp;</span><a href="http://www.linkedin.com/company/HintTech" target="_blank"><span></span></a><p></p></p></div>
</div>
<div><p class="MsoNormal">&nbsp;<p></p></p></div>
<div>
<div><p class="MsoNormal">On 16 October 2014 20:00, Gary Russo &lt;<a href="mailto:garyrusso@..." target="_blank"><span>garyrusso@...</span></a>&gt; wrote:<p></p></p></div>
<div>
<div><p class="MsoNormal">I need to extract worksheet metadata called &ldquo;defined name&rdquo; from Excel 97/2003 formatted spreadsheets.<p></p></p></div>
<div><p class="MsoNormal">&nbsp;<p></p></p></div>
<div><p class="MsoNormal">The ISYS xdmp:document-filter() API is limiting because it only extracts the text.<p></p></p></div>
<div><p class="MsoNormal">&nbsp;<p></p></p></div>
<div><p class="MsoNormal">It does not extract any worksheet metadata.<p></p></p></div>
<div><p class="MsoNormal">&nbsp;<p></p></p></div>
<div><p class="MsoNormal">Does anyone know of a workaround for this?<p></p></p></div>
<div><p class="MsoNormal">&nbsp;<p></p></p></div>
<div><p class="MsoNormal">My only thought is to upload the &ldquo;Excel 97/2003&rdquo; xls file and then convert it on the server to an &ldquo;Excel 2010&rdquo; xlsx format.<p></p></p></div>
<div><p class="MsoNormal">&nbsp;<p></p></p></div>
<div><p class="MsoNormal">Once it&rsquo;s in an Excel 2010 format, I can easily extract the &ldquo;defined name&rdquo; metadata.<p></p></p></div>
<div><p class="MsoNormal">&nbsp;<p></p></p></div>
<div><p class="MsoNormal">This is what it looks like in &ldquo;Excel 2010&rdquo; files.<p></p></p></div>
<div><p class="MsoNormal">&nbsp;<p></p></p></div>
<div><p class="MsoNormal"><span>&nbsp;<span class="apple-converted-space">&nbsp;</span><span>&lt;definedNames&gt;</span><br>&nbsp;&nbsp;&nbsp;<span class="apple-converted-space">&nbsp;</span><span>&lt;definedName</span><span class="apple-converted-space"><span>&nbsp;</span></span><span>name</span><span>=</span><span>"LastYr"</span><span>&gt;</span>Revenue!$B$6:$B$15<span>&lt;/definedName&gt;</span><br>&nbsp;&nbsp;&nbsp;<span class="apple-converted-space">&nbsp;</span><span>&lt;definedName</span><span class="apple-converted-space"><span>&nbsp;</span></span><span>name</span><span>=</span><span>"ThisYr"</span><span>&gt;</span>Revenue!$C$6:$C$15<span>&lt;/definedName&gt;</span><br>&nbsp;&nbsp;&nbsp;<span class="apple-converted-space">&nbsp;</span><span>&lt;definedName</span><span class="apple-converted-space"><span>&nbsp;</span></span><span>name</span><span>=</span><span>"Variance"</span><span>&gt;</span>Revenue!$D$6:$D$15<span>&lt;/definedName&gt;</span><br>&nbsp;<span class="apple-converted-space">&nbsp;</span><span>&lt;/definedNames&gt;</span></span><p></p></p></div>
<div><p class="MsoNormal">&nbsp;<p></p></p></div>
<div><p class="MsoNormal">&nbsp;<p></p></p></div>
<div><p class="MsoNormal">Thanks,<p></p></p></div>
<div><p class="MsoNormal">Gary Russo<p></p></p></div>
<div><p class="MsoNormal">&nbsp;<p></p></p></div>
<div><p class="MsoNormal">&nbsp;<p></p></p></div>
<div><p class="MsoNormal">Gary Russo<p></p></p></div>
<div><p class="MsoNormal">Enterprise NoSQL Developer<p></p></p></div>
<div><p class="MsoNormal">Phone:<span class="apple-converted-space">&nbsp;</span><a href="tel:212-404-8639" target="_blank"><span>212-404-8639</span></a><p></p></p></div>
<div><p class="MsoNormal">Skype: garyprusso<p></p></p></div>
<div><p class="MsoNormal"><a href="http://garyrusso.wordpress.com" target="_blank"><span>http://garyrusso.wordpress.com</span></a><p></p></p></div>
<div><p class="MsoNormal">&nbsp;<p></p></p></div>
</div>
<p class="MsoNormal"><br>_______________________________________________<br>General mailing list<br><a href="mailto:General@..."><span>General@...</span></a><br><a href="http://developer.marklogic.com/mailman/listinfo/general" target="_blank"><span>http://developer.marklogic.com/mailman/listinfo/general</span></a><p></p></p>
</div>
<div><p class="MsoNormal">&nbsp;<p></p></p></div>
</div>
<p class="MsoNormal"><span>_______________________________________________<br>General mailing list<br><a href="mailto:General@..."><span>General@...</span></a><br><a href="http://developer.marklogic.com/mailman/listinfo/general"><span>http://developer.marklogic.com/mailman/listinfo/general</span></a><p></p></span></p>
</div>
</div>
<p class="MsoNormal"><p>&nbsp;</p></p>
</div>
</div></div>
Wanczowski, Andrew | 17 Oct 19:38 2014

Re: [MarkLogic Dev General] Element Range Query with DateTime and Durations

Hi John and Dave, 

In testing the provided example worked well:

xquery version "1.0-ml";

declare namespace html = "http://www.w3.org/1999/xhtml";

let $results :=
  cts:search(
    fn:doc(),
    cts:element-range-query(xs:QName("publishedDate"), "<=", fn:current-date() - xs:dayTimeDuration("P90D"))
  )[1 to 10]

let $onSaleDates := (
  xs:date("2014-07-18"), (: July 18 published date :)
  xs:date("2014-07-19"), (: July 19 published date :)
  xs:date("2014-07-20")  (: July 20 published date :)
)

let $testCases := 
  for $onSaleDate in $onSaleDates 
  return 
  (
   $onSaleDate, 
   ($onSaleDate <= fn:current-date() - xs:dayTimeDuration("P90D")) 
  )
  
return $testCases


I am trying to solve a larger problem of filtering over larger datasets (5M+ documents)  where the durations are variables  that come from one set of documents (business rules) and filter search of other documents (articles). Basically it is to filter out content with various embargo durations.  The "business rules"only  state duration after published date not the actual  dates the embargo ends. 

Would you say there is anything to watch out with from a performance standpoint? I have range indexes set up on all the fields that require calculations. 

Thanks
Drew

From: Dave Cassel <Dave.Cassel-efBvD/aTHCF8UrSeD/g0lQ@public.gmane.org>
Date: Friday, October 17, 2014 12:09 PM
To: Andrew Wanczowski <Andrew_Wanczowski-xN+CextoL1Lby3iVrkZq2A@public.gmane.org>
Subject: Re: [MarkLogic Dev General] Element Range Query with DateTime and Durations

Drew, I wasn't sure how familiar you are with durations, so in case John's answer didn't give you what you need --

cts:element-range-query(
  xs:QName("date"), 
  ">",
  fn:current-date() - xs:dayTimeDuration("P90D")
)


-- 
Dave Cassel
Developer Community Manager
MarkLogic Corporation
Cell:  +1-484-798-8720


From: <Wanczowski>, Andrew <Andrew_Wanczowski <at> condenast.com>
Reply-To: MarkLogic Developer Discussion <general <at> developer.marklogic.com>
Date: Friday, October 17, 2014 at 6:42 AM
To: "general-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org" <general-ld4jwAGwUXTgXEvjvSGRgEPhkuQigjxi@public.gmane.org.com>
Subject: Re: [MarkLogic Dev General] Element Range Query with DateTime and Durations

Thanks John. I'll give that a shot.

On 10/17/14 9:35 AM, "John Snelson" <John.Snelson-efBvD/aTHCF8UrSeD/g0lQ@public.gmane.org> wrote:

Work out a dateTime 90 days before the current dateTime, and query for
articles with a published dateTime before that dateTime.

John

On 17/10/14 14:26, Wanczowski, Andrew wrote:
Hi All,

Is possible to do a dateTime range query based on the elements value
plus or minus a xs:dayTimeDuration. For example I want to find all
articles that are 90 days passed the published date.

The documentation
examples of date queriers but they all have a supplied dateTime.

Thanks
Drew

--
John Snelson, Lead Engineer                    http://twitter.com/jpcs
MarkLogic Corporation                         http://www.marklogic.com
_______________________________________________
General mailing list

_______________________________________________
General mailing list

<div>
<div>Hi John and Dave,&nbsp;</div>
<div>
<div><br></div>
<div>In testing the provided example worked well:</div>
<div><br></div>
<div>
<blockquote>
<div>xquery version "1.0-ml";</div>
<div><br></div>
<div>declare namespace html = "http://www.w3.org/1999/xhtml";</div>
<div><br></div>
<div>let $results :=</div>
<div>&nbsp; cts:search(</div>
<div>&nbsp; &nbsp; fn:doc(),</div>
<div>&nbsp; &nbsp; cts:element-range-query(xs:QName("publishedDate"), "&lt;=", fn:current-date() - xs:dayTimeDuration("P90D"))</div>
<div>&nbsp; )[1 to 10]</div>
<div><br></div>
<div>let $onSaleDates := (</div>
<div>&nbsp; xs:date("2014-07-18"), (: July 18 published date :)</div>
<div>&nbsp; xs:date("2014-07-19"), (: July 19 published date :)</div>
<div>&nbsp; xs:date("2014-07-20") &nbsp;(: July 20 published date :)</div>
<div>)</div>
<div><br></div>
<div>let $testCases :=&nbsp;</div>
<div>&nbsp; for $onSaleDate in $onSaleDates&nbsp;</div>
<div>&nbsp; return&nbsp;</div>
<div>&nbsp; (</div>
<div>&nbsp; &nbsp;$onSaleDate,&nbsp;</div>
<div>&nbsp; &nbsp;($onSaleDate &lt;= fn:current-date() - xs:dayTimeDuration("P90D"))&nbsp;</div>
<div>&nbsp; )</div>
<div>&nbsp;&nbsp;</div>
<div>return $testCases</div>
</blockquote>
</div>
<div><br></div>
<div><br></div>
<div>I am trying to solve a larger problem of filtering over larger datasets (5M+ documents) &nbsp;where the durations are variables &nbsp;that come from one set of documents (business rules) and filter search of other documents (articles). Basically it is to filter
 out content with various embargo durations. &nbsp;The "business rules"only &nbsp;state duration after published date not the actual &nbsp;dates the embargo ends.&nbsp;</div>
<div><br></div>
<div>Would you say there is anything to watch out with from a performance standpoint? I have range indexes set up on all the fields that require calculations.&nbsp;</div>
</div>
<div><br></div>
<div>Thanks</div>
<div>Drew</div>
<div><br></div>
<span>
<div>
<span>From: </span>Dave Cassel &lt;<a href="mailto:Dave.Cassel@...">Dave.Cassel@...</a>&gt;<br><span>Date: </span>Friday, October 17, 2014 12:09 PM<br><span>To: </span>Andrew Wanczowski &lt;<a href="mailto:Andrew_Wanczowski@...">Andrew_Wanczowski@...</a>&gt;<br><span>Subject: </span>Re: [MarkLogic Dev General] Element Range Query with DateTime and Durations<br>
</div>
<div><br></div>
<div>
<div>
<div>
<div>Drew, I wasn't sure how familiar you are with durations, so in case John's answer didn't give you what you need --</div>
<div><br></div>
<div>
<div>cts:element-range-query(</div>
<div>&nbsp; xs:QName("date"),&nbsp;</div>
<div>&nbsp; "&gt;",</div>
<div>&nbsp; fn:current-date() - xs:dayTimeDuration("P90D")</div>
<div>)</div>
</div>
<div><br></div>
<div>
<div><br></div>
<div>--&nbsp;</div>
<div>
<span>Dave Cassel<br></span><span>Developer Community Manager</span><span><br><a href="http://www.marklogic.com/">MarkLogic Corporation</a><br></span><span>Cell:&nbsp;
 +1-484-798-8720<br><br></span>
</div>
</div>
</div>
<div><br></div>
<span>
<div>
<span>From: </span>&lt;Wanczowski&gt;, Andrew &lt;<a href="mailto:Andrew_Wanczowski@...">Andrew_Wanczowski <at> condenast.com</a>&gt;<br><span>Reply-To: </span>MarkLogic Developer Discussion &lt;<a href="mailto:general@...">general <at> developer.marklogic.com</a>&gt;<br><span>Date: </span>Friday, October 17, 2014 at 6:42 AM<br><span>To: </span>"<a href="mailto:general <at> developer.marklogic.com">general@...</a>" &lt;<a href="mailto:general@...">general@....com</a>&gt;<br><span>Subject: </span>Re: [MarkLogic Dev General] Element Range Query with DateTime and Durations<br>
</div>
<div><br></div>
<blockquote>
<div>
<div>
<div>Thanks John. I'll give that a shot.</div>
<div><br></div>
<div>On 10/17/14 9:35 AM, "John Snelson" &lt;<a href="mailto:John.Snelson@...">John.Snelson@...</a>&gt; wrote:</div>
<div><br></div>
<blockquote>
<div>Work out a dateTime 90 days before the current dateTime, and query for</div>
<div>articles with a published dateTime before that dateTime.</div>
<div><br></div>
<div>John</div>
<div><br></div>
<div>On 17/10/14 14:26, Wanczowski, Andrew wrote:</div>
<blockquote>
<div>Hi All,</div>
<div><br></div>
<div>Is possible to do a dateTime range query based on the elements value</div>
<div>plus or minus a xs:dayTimeDuration. For example I want to find all</div>
<div>articles that are 90 days passed the published date.</div>
<div><br></div>
<div>The documentation</div>
<div>(<a href="http://docs.marklogic.com/cts:element-range-query">http://docs.marklogic.com/cts:element-range-query</a>)&nbsp;&nbsp;gives a few</div>
<div>examples of date queriers but they all have a supplied dateTime.</div>
<div><br></div>
<div>Thanks</div>
<div>Drew</div>
</blockquote>
<div><br></div>
<div>-- </div>
<div>John Snelson, Lead Engineer&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="http://twitter.com/jpcs">http://twitter.com/jpcs</a>
</div>
<div>MarkLogic Corporation&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a href="http://www.marklogic.com">
http://www.marklogic.com</a>
</div>
<div>_______________________________________________</div>
<div>General mailing list</div>
<div><a href="mailto:General@...">General <at> developer.marklogic.com</a></div>
<div><a href="http://developer.marklogic.com/mailman/listinfo/general">http://developer.marklogic.com/mailman/listinfo/general</a></div>
</blockquote>
<div><br></div>
<div>_______________________________________________</div>
<div>General mailing list</div>
<div><a href="mailto:General@...">General <at> developer.marklogic.com</a></div>
<div><a href="http://developer.marklogic.com/mailman/listinfo/general">http://developer.marklogic.com/mailman/listinfo/general</a></div>
<div><br></div>
</div>
</div>
</blockquote>
</span>
</div>
</div>
</span>
</div>
Gary Russo | 17 Oct 16:35 2014
Picon

Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003?

Hello Dennis,

 

Thanks for the info.

 

Yes, I tried xdmp:excel-convert() but this does not get the worksheet metadata either.

 

The metadata that I need to retrieve from the older excel format is the “Named Fields”.

 

Users create them using the Excel “Named Box” feature as shown here. => http://spreadsheets.about.com/od/exceltips/qt/81225namebox.htm

 

It looks like my only option is to use the Apache POI Java API to extract the named fields or use it to convert xls-to-xlsx on-the-fly. => https://poi.apache.org/apidocs

 

I know there’s a hidden way to use MarkLogic’s underlying JVM.

 

It would be great if I could use it to call the Apache POI code.

 

But that’s a question for another day.

 

Thanks again,

 

Gary Russo

 

 

Gary Russo

Enterprise NoSQL Developer

http://garyrusso.wordpress.com

http://twitter.com/garyprusso

 

 

 

From: general-bounces-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org [mailto:general-bounces-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org] On Behalf Of David Ennis
Sent: Thursday, October 16, 2014 5:02 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003?

 

HI.

 

I believe that with the conversion licence, you can do what you want with: xdmp:excel-convert

 

Barring that, you could always run openoffice as a headless server for conversion purposes.

 

Kind Regards,

David Ennis

 

 


 

 

Kind Regards,

David Ennis

 

 

David Ennis
Content Engineer

 
Mastering the value of content
creative | technology | content

Delftechpark 37i
2628 XJ Delft
The Netherlands
T: +31 88 268 25 00
M: +31 63 091 72 80 

   

 

On 16 October 2014 20:00, Gary Russo <garyrusso-PkbjNfxxIARBDgjK7y7TUQ@public.gmane.org> wrote:

I need to extract worksheet metadata called “defined name” from Excel 97/2003 formatted spreadsheets.

 

The ISYS xdmp:document-filter() API is limiting because it only extracts the text.

 

It does not extract any worksheet metadata.

 

Does anyone know of a workaround for this?

 

My only thought is to upload the “Excel 97/2003” xls file and then convert it on the server to an “Excel 2010” xlsx format.

 

Once it’s in an Excel 2010 format, I can easily extract the “defined name” metadata.

 

This is what it looks like in “Excel 2010” files.

 

  <definedNames>
   
<definedName name="LastYr">Revenue!$B$6:$B$15</definedName>
   
<definedName name="ThisYr">Revenue!$C$6:$C$15</definedName>
   
<definedName name="Variance">Revenue!$D$6:$D$15</definedName>
 
</definedNames>

 

 

Thanks,

Gary Russo

 

 

Gary Russo

Enterprise NoSQL Developer

Phone: 212-404-8639

Skype: garyprusso

http://garyrusso.wordpress.com

 


_______________________________________________
General mailing list
General-ld4jwAGwUXTgXEvjvSGRgNi2O/JbrIOy@public.gmane.orgic.com
http://developer.marklogic.com/mailman/listinfo/general

 

<div><div class="WordSection1">
<p class="MsoNormal"><span>Hello Dennis,<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>Thanks for the info.<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>Yes, I tried xdmp:excel-convert() but this does not get the worksheet metadata either.<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>The metadata that I need to retrieve from the older excel format is the &ldquo;Named Fields&rdquo;.<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>Users create them using the Excel &ldquo;Named Box&rdquo; feature as shown here. =&gt; <a href="http://spreadsheets.about.com/od/exceltips/qt/81225namebox.htm">http://spreadsheets.about.com/od/exceltips/qt/81225namebox.htm</a><p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>It looks like my only option is to use the Apache POI Java API to extract the named fields or use it to convert xls-to-xlsx on-the-fly. =&gt; <a href="https://poi.apache.org/apidocs">https://poi.apache.org/apidocs</a><p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>I know there&rsquo;s a hidden way to use MarkLogic&rsquo;s underlying JVM.<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>It would be great if I could use it to call the Apache POI code.<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>But that&rsquo;s a question for another day.<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>Thanks again,<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>Gary Russo<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>Gary Russo<p></p></span></p>
<p class="MsoNormal"><span>Enterprise NoSQL Developer<p></p></span></p>
<p class="MsoNormal"><span><a href="http://garyrusso.wordpress.com"><span>http://garyrusso.wordpress.com</span></a><p></p></span></p>
<p class="MsoNormal"><span><a href="http://twitter.com/garyprusso">http://twitter.com/garyprusso</a><p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>From:</span><span> general-bounces@... [mailto:general-bounces@...] On Behalf Of David Ennis<br>Sent: Thursday, October 16, 2014 5:02 PM<br>To: MarkLogic Developer Discussion<br>Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003?<p></p></span></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
<div>
<div><p class="MsoNormal"><span>HI.<p></p></span></p></div>
<div><p class="MsoNormal"><span><p>&nbsp;</p></span></p></div>
<div><p class="MsoNormal"><span>I believe that with the conversion licence, you can do what you want with:&nbsp;xdmp:excel-convert<p></p></span></p></div>
<div><p class="MsoNormal"><span><p>&nbsp;</p></span></p></div>
<div><p class="MsoNormal"><span>Barring that, you could always run openoffice as a headless server for conversion purposes.<p></p></span></p></div>
<div><p class="MsoNormal"><span><p>&nbsp;</p></span></p></div>
<div><p class="MsoNormal"><span>Kind Regards,<p></p></span></p></div>
<div><p class="MsoNormal"><span>David Ennis<p></p></span></p></div>
<div><p class="MsoNormal"><span><p>&nbsp;</p></span></p></div>
<div><p class="MsoNormal"><span><p>&nbsp;</p></span></p></div>
</div>
<div>
<p class="MsoNormal"><br clear="all"><p></p></p>
<div><div>
<div><p class="MsoNormal"><p>&nbsp;</p></p></div>
<div><p class="MsoNormal"><p>&nbsp;</p></p></div>
<div><p class="MsoNormal"><span>Kind Regards,</span><p></p></p></div>
<div><p class="MsoNormal"><span>David Ennis</span><p></p></p></div>
<div><p class="MsoNormal"><p>&nbsp;</p></p></div>
<div><p class="MsoNormal"><p>&nbsp;</p></p></div>
<p class="MsoNormal"><span>David Ennis</span><span><br></span><span>Content Engineer</span><span><br><br></span><a href="http://www.hinttech.com/" target="_blank"><span>&nbsp;</span></a><span><br></span><span>Mastering the value of content</span><span><br></span><span>creative | technology | content</span><span><br><br></span><span>Delftechpark 37i<br>2628 XJ Delft<br>The Netherlands<br></span><span>T:</span><span>&nbsp;+31 88 268 25 00<br></span><span>M:</span><span>&nbsp;+31 63 091 72 80&nbsp;</span><span><br><br></span><a href="http://www.hinttech.com" target="_blank"><span></span></a><span>&nbsp;</span><a href="https://twitter.com/HintTech" target="_blank"><span></span></a><span>&nbsp;</span><a href="http://www.facebook.com/HintTech" target="_blank"><span></span></a><span>&nbsp;</span><a href="http://www.linkedin.com/company/HintTech" target="_blank"><span></span></a><p></p></p>
</div></div>
<p class="MsoNormal"><p>&nbsp;</p></p>
<div>
<p class="MsoNormal">On 16 October 2014 20:00, Gary Russo &lt;<a href="mailto:garyrusso@..." target="_blank">garyrusso@...</a>&gt; wrote:<p></p></p>
<div><div>
<p class="MsoNormal">I need to extract worksheet metadata called &ldquo;defined name&rdquo; from Excel 97/2003 formatted spreadsheets.<p></p></p>
<p class="MsoNormal">&nbsp;<p></p></p>
<p class="MsoNormal">The ISYS xdmp:document-filter() API is limiting because it only extracts the text.<p></p></p>
<p class="MsoNormal">&nbsp;<p></p></p>
<p class="MsoNormal">It does not extract any worksheet metadata.<p></p></p>
<p class="MsoNormal">&nbsp;<p></p></p>
<p class="MsoNormal">Does anyone know of a workaround for this?<p></p></p>
<p class="MsoNormal">&nbsp;<p></p></p>
<p class="MsoNormal">My only thought is to upload the &ldquo;Excel 97/2003&rdquo; xls file and then convert it on the server to an &ldquo;Excel 2010&rdquo; xlsx format.<p></p></p>
<p class="MsoNormal">&nbsp;<p></p></p>
<p class="MsoNormal">Once it&rsquo;s in an Excel 2010 format, I can easily extract the &ldquo;defined name&rdquo; metadata.<p></p></p>
<p class="MsoNormal">&nbsp;<p></p></p>
<p class="MsoNormal">This is what it looks like in &ldquo;Excel 2010&rdquo; files.<p></p></p>
<p class="MsoNormal">&nbsp;<p></p></p>
<p class="MsoNormal"><span>&nbsp; </span><span>&lt;definedNames&gt;</span><span><br>&nbsp;&nbsp;&nbsp; </span><span>&lt;definedName</span><span> name</span><span>=</span><span>"LastYr"</span><span>&gt;</span><span>Revenue!$B$6:$B$15</span><span>&lt;/definedName&gt;</span><span><br>&nbsp;&nbsp;&nbsp; </span><span>&lt;definedName</span><span> name</span><span>=</span><span>"ThisYr"</span><span>&gt;</span><span>Revenue!$C$6:$C$15</span><span>&lt;/definedName&gt;</span><span><br>&nbsp;&nbsp;&nbsp; </span><span>&lt;definedName</span><span> name</span><span>=</span><span>"Variance"</span><span>&gt;</span><span>Revenue!$D$6:$D$15</span><span>&lt;/definedName&gt;</span><span><br>&nbsp; </span><span>&lt;/definedNames&gt;</span><p></p></p>
<p class="MsoNormal">&nbsp;<p></p></p>
<p class="MsoNormal">&nbsp;<p></p></p>
<p class="MsoNormal">Thanks,<p></p></p>
<p class="MsoNormal">Gary Russo<p></p></p>
<p class="MsoNormal">&nbsp;<p></p></p>
<p class="MsoNormal">&nbsp;<p></p></p>
<p class="MsoNormal">Gary Russo<p></p></p>
<p class="MsoNormal">Enterprise NoSQL Developer<p></p></p>
<p class="MsoNormal">Phone: <a href="tel:212-404-8639" target="_blank">212-404-8639</a><p></p></p>
<p class="MsoNormal">Skype: garyprusso<p></p></p>
<p class="MsoNormal"><a href="http://garyrusso.wordpress.com" target="_blank">http://garyrusso.wordpress.com</a><p></p></p>
<p class="MsoNormal">&nbsp;<p></p></p>
</div></div>
<p class="MsoNormal"><br>_______________________________________________<br>General mailing list<br><a href="mailto:General@...">General@...ic.com</a><br><a href="http://developer.marklogic.com/mailman/listinfo/general" target="_blank">http://developer.marklogic.com/mailman/listinfo/general</a><p></p></p>
</div>
<p class="MsoNormal"><p>&nbsp;</p></p>
</div>
</div></div>
Girish Kulkarni | 16 Oct 00:47 2014
Picon

[MarkLogic Dev General] word query

I had some fileds in my xml document like enrichedDateTime which i didnt want to index and search upon. When i added this in the word query exclusion list for some reason my search result isn't returning back this document at all even when i searched for other field like <content>. However when i added the  root field name <fix> to my inclusion list i do see the document back again. I had already set the include root flag to true but seems like for some reason i am un-able to search for other fields in the document. Any ideas why this could be happening ?

<fix>
<content> some content goes here </content>
<enrichedDateTime>2014-09-30T16:32:27.424443-07:00</enrichedDateTime>
</fix>



Girish Kulkarni
<div><div dir="ltr">
<div class="gmail_default">I had some fileds in my xml document like enrichedDateTime which i didnt want to index and search upon. When i added this in the word query exclusion list for some reason my search result isn't returning back this document at all even when i searched for other field like &lt;content&gt;. However when i added the &nbsp;root field name &lt;fix&gt; to my inclusion list i do see the document back again. I had already set the include root flag to true but seems like for some reason i am un-able to search for other fields in the document. Any ideas why this could be happening ?</div>
<div class="gmail_default"><br></div>&lt;fix&gt;<br>&lt;content&gt; some content goes here &lt;/content&gt;<br>&lt;enrichedDateTime&gt;2014-09-30T16:32:27.424443-07:00&lt;/enrichedDateTime&gt;<br>&lt;/fix&gt;<div><br></div>
<div>
<br><div><br></div>Girish Kulkarni
</div>
</div></div>
Kapoor, Pragya | 15 Oct 09:59 2014

[MarkLogic Dev General] using xdmp:eval in REST Service

Hi,

I need to pick the all the docs in directory path ($Path) from Ingestion DB and insert them in Db which is configured for rest services(rest-ingestion).

The below code works fine from Qconsole, but from Rest service , the xml being picked from the Ingestion DB is not coming as xml when the code is hit from Rest service. In logs, only content is theres, xml elements are missing.(attached are the logs)

Please let me know, what I am missing in this code.

Thanks
Pragya

Code:

import module namespace dls = "http://marklogic.com/xdmp/dls" 
      at "/MarkLogic/dls.xqy";

import module namespace functx = "http://www.functx.com"
    at "/MarkLogic/functx/functx-1.0-nodoc-2007-01.xqy"; 

let $transId := '39932186-9cab-44e9-8f4f-7ebf45dabf8f'
let $PrefixURI := "/docs/"
        
        let $Path := fn:concat('/processing/', $transId,'/validDocs/')
        let $DirectoryListing := 
                    xdmp:eval('
                             import module namespace functx = "http://www.functx.com"
                                at "/MarkLogic/functx/functx-1.0-nodoc-2007-01.xqy"; 
                             declare variable $Path as xs:string external;
                             
                              xdmp:directory($Path)
                             '
                             ,
                             (xs:QName("Path"), $Path)
                             ,
                             <options xmlns="xdmp:eval">
                               <database>{xdmp:database("Ingestion")}</database>
                             </options>
                            )
            for $FileEntry in $DirectoryListing
                let $Filename := functx:substring-after-last(xdmp:node-uri($FileEntry),'/')
                let $docUri := fn:concat($PrefixURI, $Filename)
                let $_ := xdmp:log(fn:concat("uri",$docUri))
                let $contents := $FileEntry
                return 
                        (: Insert the document :)
                      (dls:document-insert-and-manage(
                       $docUri,
                        fn:false(),
                       $contents,
                        "created",
                        (xdmp:permission("dls-user", "read"),
                         xdmp:permission("dls-user", "update")),
                        "historic"),
                      xdmp:document-add-collections(
                       $docUri,
                        "latest"),
                      xdmp:document-remove-collections(
                            $docUri,  "historic")
                       
                       )


"This e-mail and any attachments transmitted with it are for the sole use of the intended recipient(s) and may contain confidential , proprietary or privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this e-mail or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful."
2014-10-15 13:09:34.506 Info: rest-ingestion:          4076
2014-10-15 13:09:34.506 Info: rest-ingestion:          REPO-GMR-2000
2014-10-15 13:09:34.506 Info: rest-ingestion:          2010-09-22
2014-10-15 13:09:34.506 Info: rest-ingestion:          BAU
2014-10-15 13:09:34.506 Info: rest-ingestion:          1-44
2014-10-15 13:09:34.506 Info: rest-ingestion:       
2014-10-15 13:09:34.506 Info: rest-ingestion:       
2014-10-15 13:09:34.506 Info: rest-ingestion:          
2014-10-15 13:09:34.506 Info: rest-ingestion:             
2014-10-15 13:09:34.506 Info: rest-ingestion:                   Euro
2014-10-15 13:09:34.506 Info: rest-ingestion:                
2014-10-15 13:09:34.506 Info: rest-ingestion:             Same daySame day
2014-10-15 13:09:34.506 Info: rest-ingestion:             
2014-10-15 13:09:34.506 Info: rest-ingestion:                
2014-10-15 13:09:34.506 Info: rest-ingestion:             
2014-10-15 13:09:34.506 Info: rest-ingestion:          
2014-10-15 13:09:34.506 Info: rest-ingestion:          
2014-10-15 13:09:34.506 Info: rest-ingestion:             
2014-10-15 13:09:34.506 Info: rest-ingestion:                
2014-10-15 13:09:34.506 Info: rest-ingestion:                   Yes
2014-10-15 13:09:34.506 Info: rest-ingestion:                   STAR FINANCIAL SERVICES LTD.
2014-10-15 13:09:34.506 Info: rest-ingestion:                   New York
2014-10-15 13:09:34.506 Info: rest-ingestion:                
2014-10-15 13:09:34.506 Info: rest-ingestion:                
2014-10-15 13:09:34.506 Info: rest-ingestion:                   No
2014-10-15 13:09:34.506 Info: rest-ingestion:                
2014-10-15 13:09:34.506 Info: rest-ingestion:             
2014-10-15 13:09:34.506 Info: rest-ingestion:          
2014-10-15 13:09:34.506 Info: rest-ingestion:          
2014-10-15 13:09:34.506 Info: rest-ingestion:             
2014-10-15 13:09:34.506 Info: rest-ingestion:                
2014-10-15 13:09:34.506 Info: rest-ingestion:                   Failure to Deliver Securities
2014-10-15 13:09:34.506 Info: rest-ingestion:                   Standard
2014-10-15 13:09:34.506 Info: rest-ingestion:                
2014-10-15 13:09:34.506 Info: rest-ingestion:             
2014-10-15 13:09:34.506 Info: rest-ingestion:          
2014-10-15 13:09:34.506 Info: rest-ingestion:          
2014-10-15 13:09:34.506 Info: rest-ingestion:             
2014-10-15 13:09:34.506 Info: rest-ingestion:                Not applicable
2014-10-15 13:09:34.506 Info: rest-ingestion:                Not applicable
2014-10-15 13:09:34.506 Info: rest-ingestion:                Not applicable
2014-10-15 13:09:34.506 Info: rest-ingestion:                Standard
2014-10-15 13:09:34.506 Info: rest-ingestion:             
2014-10-15 13:09:34.506 Info: rest-ingestion:             
2014-10-15 13:09:34.506 Info: rest-ingestion:                
2014-10-15 13:09:34.506 Info: rest-ingestion:                   
2014-10-15 13:09:34.506 Info: rest-ingestion:                      MAKEMAKE BANK, N.A.
2014-10-15 13:09:34.506 Info: rest-ingestion:                      Active
2014-10-15 13:09:34.506 Info: rest-ingestion:                      
2014-10-15 13:09:34.506 Info: rest-ingestion:                         
2014-10-15 13:09:34.506 Info: rest-ingestion:                      
2014-10-15 13:09:34.506 Info: rest-ingestion:                      
2014-10-15 13:09:34.506 Info: rest-ingestion:                   
2014-10-15 13:09:34.506 Info: rest-ingestion:                
2014-10-15 13:09:34.506 Info: rest-ingestion:                
2014-10-15 13:09:34.506 Info: rest-ingestion:                   
2014-10-15 13:09:34.506 Info: rest-ingestion:                      ABC INC.
2014-10-15 13:09:34.506 Info: rest-ingestion:                      Active
2014-10-15 13:09:34.506 Info: rest-ingestion:                      
2014-10-15 13:09:34.506 Info: rest-ingestion:                         
2014-10-15 13:09:34.506 Info: rest-ingestion:                      
2014-10-15 13:09:34.506 Info: rest-ingestion:                      
2014-10-15 13:09:34.506 Info: rest-ingestion:                   
2014-10-15 13:09:34.506 Info: rest-ingestion:                
2014-10-15 13:09:34.506 Info: rest-ingestion:             
2014-10-15 13:09:34.506 Info: rest-ingestion:             
2014-10-15 13:09:34.506 Info: rest-ingestion:                England and Wales
2014-10-15 13:09:34.506 Info: rest-ingestion:             
2014-10-15 13:09:34.506 Info: rest-ingestion:             REPO-GMR-20002010-09-234076
2014-10-15 13:09:34.506 Info: rest-ingestion:          4076
2014-10-15 13:09:34.506 Info: rest-ingestion:          REPO-GMR-2000
2014-10-15 13:09:34.506 Info: rest-ingestion:          2010-09-22
2014-10-15 13:09:34.506 Info: rest-ingestion:          BAU
2014-10-15 13:09:34.506 Info: rest-ingestion:          1-44
2014-10-15 13:09:34.506 Info: rest-ingestion:       
2014-10-15 13:09:34.506 Info: rest-ingestion:       
2014-10-15 13:09:34.506 Info: rest-ingestion:          
2014-10-15 13:09:34.506 Info: rest-ingestion:             
2014-10-15 13:09:34.506 Info: rest-ingestion:                   Euro
2014-10-15 13:09:34.506 Info: rest-ingestion:                
2014-10-15 13:09:34.506 Info: rest-ingestion:             Same daySame day
2014-10-15 13:09:34.506 Info: rest-ingestion:             
2014-10-15 13:09:34.506 Info: rest-ingestion:                
2014-10-15 13:09:34.506 Info: rest-ingestion:             
2014-10-15 13:09:34.506 Info: rest-ingestion:          
2014-10-15 13:09:34.506 Info: rest-ingestion:          
2014-10-15 13:09:34.506 Info: rest-ingestion:             
2014-10-15 13:09:34.506 Info: rest-ingestion:                
2014-10-15 13:09:34.506 Info: rest-ingestion:                   Yes
2014-10-15 13:09:34.506 Info: rest-ingestion:                   STAR FINANCIAL SERVICES LTD.
2014-10-15 13:09:34.506 Info: rest-ingestion:                   New York
2014-10-15 13:09:34.506 Info: rest-ingestion:                
2014-10-15 13:09:34.506 Info: rest-ingestion:                
2014-10-15 13:09:34.506 Info: rest-ingestion:                   No
2014-10-15 13:09:34.506 Info: rest-ingestion:                
2014-10-15 13:09:34.506 Info: rest-ingestion:             
2014-10-15 13:09:34.506 Info: rest-ingestion:          
2014-10-15 13:09:34.506 Info: rest-ingestion:          
2014-10-15 13:09:34.506 Info: rest-ingestion:             
2014-10-15 13:09:34.506 Info: rest-ingestion:                
2014-10-15 13:09:34.506 Info: rest-ingestion:                   Failure to Deliver Securities
2014-10-15 13:09:34.506 Info: rest-ingestion:                   Standard
2014-10-15 13:09:34.506 Info: rest-ingestion:                
2014-10-15 13:09:34.506 Info: rest-ingestion:             
2014-10-15 13:09:34.506 Info: rest-ingestion:          
2014-10-15 13:09:34.506 Info: rest-ingestion:          
2014-10-15 13:09:34.506 Info: rest-ingestion:             
2014-10-15 13:09:34.506 Info: rest-ingestion:                Not applicable
2014-10-15 13:09:34.506 Info: rest-ingestion:                Not applicable
2014-10-15 13:09:34.506 Info: rest-ingestion:                Not applicable
2014-10-15 13:09:34.506 Info: rest-ingestion:                Standard
2014-10-15 13:09:34.506 Info: rest-ingestion:             
2014-10-15 13:09:34.506 Info: rest-ingestion:             
2014-10-15 13:09:34.506 Info: rest-ingestion:                
2014-10-15 13:09:34.506 Info: rest-ingestion:                   
2014-10-15 13:09:34.506 Info: rest-ingestion:                      MAKEMAKE BANK, N.A.
2014-10-15 13:09:34.506 Info: rest-ingestion:                      Active
2014-10-15 13:09:34.506 Info: rest-ingestion:                      
2014-10-15 13:09:34.506 Info: rest-ingestion:                         
2014-10-15 13:09:34.506 Info: rest-ingestion:                      
2014-10-15 13:09:34.506 Info: rest-ingestion:                      
2014-10-15 13:09:34.506 Info: rest-ingestion:                   
2014-10-15 13:09:34.506 Info: rest-ingestion:                
2014-10-15 13:09:34.506 Info: rest-ingestion:                
2014-10-15 13:09:34.506 Info: rest-ingestion:                   
2014-10-15 13:09:34.506 Info: rest-ingestion:                      ABC INC.
2014-10-15 13:09:34.506 Info: rest-ingestion:                      Active
2014-10-15 13:09:34.506 Info: rest-ingestion:                      
2014-10-15 13:09:34.506 Info: rest-ingestion:                         
2014-10-15 13:09:34.506 Info: rest-ingestion:                      
2014-10-15 13:09:34.506 Info: rest-ingestion:                      
2014-10-15 13:09:34.506 Info: rest-ingestion:                   
2014-10-15 13:09:34.506 Info: rest-ingestion:                
2014-10-15 13:09:34.506 Info: rest-ingestion:             
2014-10-15 13:09:34.506 Info: rest-ingestion:             
2014-10-15 13:09:34.506 Info: rest-ingestion:                England and Wales
2014-10-15 13:09:34.506 Info: rest-ingestion:             
2014-10-15 13:09:34.506 Info: rest-ingestion:             REPO-GMR-20002010-09-234076
David Sewell | 14 Oct 21:28 2014

[MarkLogic Dev General] xdmp:output serialization options not working?

Given this code:

xquery version "1.0-ml";
declare option xdmp:output "indent-untyped=yes";
declare option xdmp:output "omit-xml-declaration=yes";
xdmp:document-insert(
   "/test.xml",
   <doc>
     <line>line 1</line>
     <line>line2</line>
   </doc>
) ;

doc("/test.xml")

The output I'm getting is

<?xml version="1.0" encoding="UTF-8"?>
<doc><line>line 1</line><line>line2</line></doc>

which is disobeying both serialization optinons I specified. ML version 7.0-4. 
Are other people seeing this? Am I missing something?

David

--

-- 
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 400314, Charlottesville, VA 22904-4314 USA
Email: dsewell@...   Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/
Gary Russo | 14 Oct 21:16 2014
Picon

Re: [MarkLogic Dev General] How to optimize the REST API Bulk Ingestion Performance?

Hello Danny,

 

Yes, I’m using 7.0-4.

 

>> What are you comparing it to on the Oracle side?

>> In MarkLogic, the content will be all indexed and searchable.  Is that true on the orcl side too

 

The Oracle side is doing a basic CLOB insert with no indexing.

 

The Oracle server being compared to is a higher capacity system so we expected to see a faster ingestion.

 

I didn’t expect the MarkLogic side to be 4 times slower.

 

Yes, we tried tweaking the batch size. The 500 batch size had the fastest load times.

 

I will investigate further but I believe the bottleneck is on the MarkLogic side.

 

I believe the MarkLogic CPU has some room for parallelizing.

 

I’ll create a custom REST Extension that will spawn multiple threads for the doc-inserts.

 

I assume the REST API bulk ingestion already does this but I can’t say for sure.

 

I’ll keep you posted.

 

Thanks Danny

 

-          Gary R

 

 

 

From: general-bounces-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org [mailto:general-bounces-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org] On Behalf Of Danny Sokolsky
Sent: Tuesday, October 14, 2014 2:00 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] How to optimize the REST API Bulk Ingestion Performance?

 

Hi Gary,

 

A few thoughts here.  You are using 7.0-4 on this? 

 

What are you comparing it to on the Oracle side?  In MarkLogic, the content will be all indexed and searchable.  Is that true on the orcl side too?

 

What indexes to you have enabled?  Maybe you do not need them all (or maybe you should put the equivalent indexing on the orcl side)?

 

Have you tried tweaking the batch size?  I would try a smaller number, say 50 or 100.

 

Have you analyzed where you are spending the time?  In the c# code?  In the code loading the doc on MarkLogic?

 

Do you have multiple threads loading from your .net program?  If you are not maxing out your cpu on the MarkLogic side, you probably have room for more parallelization.

 

-Danny

 

From: general-bounces-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org [mailto:general-bounces-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org] On Behalf Of Gary Russo
Sent: Tuesday, October 14, 2014 9:21 AM
To: general-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org
Subject: [MarkLogic Dev General] How to optimize the REST API Bulk Ingestion Performance?

 

MarkLogic Bulk ingestion processing is slower than an equivalent Oracle ingestion process.

 

The MarkLogic ingestion takes 30 minutes. An Oracle equivalent only takes 7 minutes.

 

I’m using the REST API to bulk ingest multiple documents as described here. => http://docs.marklogic.com/guide/rest-dev/bulk#id_54649

 

Notes:

·         C# code is used to call the MarkLogic Bulk Ingest REST API.

·         Document batch size used is 500.

·         Average doc size is 1 KB.

·         JSON Conversion and Validation logic occurs in the C# code.

 

 

Any thoughts on how to optimize the MarkLogic bulk ingest to make it as fast as Oracle’s 7 minute load time?

 

 

Thanks,

Gary R

 

 

Gary Russo

Enterprise NoSQL Developer

http://garyrusso.wordpress.com

 

<div><div class="WordSection1">
<p class="MsoNormal"><span>Hello Danny,<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>Yes, I&rsquo;m using 7.0-4.<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>&gt;&gt; </span><span>What are you comparing it to on the Oracle side?<p></p></span></p>
<p class="MsoNormal"><span>&gt;&gt; In MarkLogic, the content will be all indexed and searchable.&nbsp; Is that true on the orcl side too</span><span><p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>The Oracle side is doing a basic CLOB insert with no indexing.<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>The Oracle server being compared to is a higher capacity system so we expected to see a faster ingestion.<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>I didn&rsquo;t expect the MarkLogic side to be 4 times slower.<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>Yes, we tried tweaking the batch size. The 500 batch size had the fastest load times.<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>I will investigate further but I believe the bottleneck is on the MarkLogic side.<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>I believe the MarkLogic CPU has some room for parallelizing.<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>I&rsquo;ll create a custom REST Extension that will spawn multiple threads for the doc-inserts.<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>I assume the REST API bulk ingestion already does this but I can&rsquo;t say for sure.<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>I&rsquo;ll keep you posted.<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>Thanks Danny<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoListParagraph"><span><span>-<span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span></span><span>Gary R<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<div><div><p class="MsoNormal"><span>From:</span><span> general-bounces@... [mailto:general-bounces@...] On Behalf Of Danny Sokolsky<br>Sent: Tuesday, October 14, 2014 2:00 PM<br>To: MarkLogic Developer Discussion<br>Subject: Re: [MarkLogic Dev General] How to optimize the REST API Bulk Ingestion Performance?<p></p></span></p></div></div>
<p class="MsoNormal"><p>&nbsp;</p></p>
<p class="MsoNormal"><span>Hi Gary,<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>A few thoughts here.&nbsp; You are using 7.0-4 on this?&nbsp; <p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>What are you comparing it to on the Oracle side?&nbsp; In MarkLogic, the content will be all indexed and searchable.&nbsp; Is that true on the orcl side too?<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>What indexes to you have enabled? &nbsp;Maybe you do not need them all (or maybe you should put the equivalent indexing on the orcl side)?<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>Have you tried tweaking the batch size?&nbsp; I would try a smaller number, say 50 or 100.<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>Have you analyzed where you are spending the time?&nbsp; In the c# code?&nbsp; In the code loading the doc on MarkLogic?<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>Do you have multiple threads loading from your .net program?&nbsp; If you are not maxing out your cpu on the MarkLogic side, you probably have room for more parallelization.<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>-Danny<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<div><div><p class="MsoNormal"><span>From:</span><span> general-bounces@... [mailto:general-bounces@...] On Behalf Of Gary Russo<br>Sent: Tuesday, October 14, 2014 9:21 AM<br>To: general@...<br>Subject: [MarkLogic Dev General] How to optimize the REST API Bulk Ingestion Performance?<p></p></span></p></div></div>
<p class="MsoNormal"><p>&nbsp;</p></p>
<p class="MsoNormal">MarkLogic Bulk ingestion processing is slower than an equivalent Oracle ingestion process.<p></p></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
<p class="MsoNormal">The MarkLogic ingestion takes 30 minutes. An Oracle equivalent only takes 7 minutes.<p></p></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
<p class="MsoNormal">I&rsquo;m using the REST API to bulk ingest multiple documents as described here. =&gt; <a href="http://docs.marklogic.com/guide/rest-dev/bulk#id_54649">http://docs.marklogic.com/guide/rest-dev/bulk#id_54649</a><p></p></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
<p class="MsoNormal">Notes:<p></p></p>
<p class="MsoListParagraph"><span><span>&middot;<span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span></span>C# code is used to call the MarkLogic Bulk Ingest REST API.<p></p></p>
<p class="MsoListParagraph"><span><span>&middot;<span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span></span>Document batch size used is 500.<p></p></p>
<p class="MsoListParagraph"><span><span>&middot;<span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span></span>Average doc size is 1 KB.<p></p></p>
<p class="MsoListParagraph"><span><span>&middot;<span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span></span>JSON Conversion and Validation logic occurs in the C# code.<p></p></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
<p class="MsoNormal">Any thoughts on how to optimize the MarkLogic bulk ingest to make it as fast as Oracle&rsquo;s 7 minute load time?<p></p></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
<p class="MsoNormal">Thanks,<p></p></p>
<p class="MsoNormal">Gary R<p></p></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
<p class="MsoNormal">Gary Russo<p></p></p>
<p class="MsoNormal">Enterprise NoSQL Developer<p></p></p>
<p class="MsoNormal"><a href="http://garyrusso.wordpress.com">http://garyrusso.wordpress.com</a><p></p></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
</div></div>
Paul M | 14 Oct 19:44 2014
Picon

Re: [MarkLogic Dev General] How to optimize the REST API Bulk Ingestion Performance?

The underlying hardware (disk, volume, logical, striping, pipe) is comparable? Other than bulk ingest, ML speed is acceptable?

From: "general-request-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org" <general-request-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org>
To: general-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org
Sent: Tuesday, October 14, 2014 12:58 PM
Subject: General Digest, Vol 124, Issue 30

Send General mailing list submissions to
    general-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org

To subscribe or unsubscribe via the World Wide Web, visit
    http://developer.marklogic.com/mailman/listinfo/general
or, via email, send a message with subject or body 'help' to
    general-request-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org

You can reach the person managing the list at
   
general-owner-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of General digest..."


Today's Topics:

  1. Determining ID of database on another    cluster (Danny Sinang)
  2. Re: How to call xdmp:eval() with    transaction-id option?
      (Gary Russo)
  3. How to optimize the REST API Bulk Ingestion    Performance?
      (Gary Russo)
  4. Re: Determining ID of database on another    cluster (Danny Sinang)


----------------------------------------------------------------------

Message: 1
Date: Tue, 14 Oct 2014 11:16:51 -0400
From: Danny Sinang <d.sinang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Subject: [MarkLogic Dev General] Determining ID of database on another
    cluster
To: general <General-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org>
Message-ID:
    <CAPKs-UKbrcQpB90W7DE+4++Ga-ToHYHuWO+zpA07KQwTqKLOtQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
Content-Type: text/plain; charset="utf-8"

I'm trying to write an XQUERY script to configure our production cluster
(running ML 7.0-3) to replicate some of its databases to a new dev cluster.

My plan is to loop through the names of the production databases that I
want replicated and call :

*1. admin:database -foreign-replica($foreign-cluster-id, $foreign-db-id) *and

*2. admin:database-set-foreign-replicas () *

for each database.

Question is, how do I determine *$foreign-db-id* at runtime ? I know
before-hand its name (the same as the production database name), but I
can't seem to find the right function to use to  get the ID of that
"foreign" database.

Any ideas ?

There is of course the option of me running a script on the dev cluster to
generated the db IDs and copying them over to my master cluster script as
hardcoded values, but I'm trying to avoid hardcoding as much as possible.

Regards,
Danny
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20141014/2c2eb45a/attac hment-0001.html

------------------------------

Message: 2
Date: Tue, 14 Oct 2014 11:48:33 -0400
From: Gary Russo <garyrusso-PkbjNfxxIARBDgjK7y7TUQ@public.gmane.org>
Subject: Re: [MarkLogic Dev General] How to call xdmp:eval() with
    transaction-id option?
To: "'MarkLogic Developer Discussion'"
    <general-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org>
Message-ID: <BLU182-DS2590858171305BA8AD4142AAAD0-MsuGFMq8XAE@public.gmane.org>
Content-Type: text/plain; charset="us-ascii"

Thanks John



I'm using xdmp:transaction-create() with the update mode option.



xdmp:transacti on-create(

  <options xmlns="xdmp:eval">

    <transaction-mode>update</transaction-mode>

  </options>





I'm using this xdmp:eval() with the transaction-id option.



xdmp:eval(

  $evalCmd,

  (xs:QName("uri"), $uri),

  <options xmlns="xdmp:eval">

    <transaction-id>{$longTxId}</transaction-id>

  </options>





I posted the REST Extension Code here. =>
https://github.com/garyrusso/GLM-Search



I also posted a C# tool that I use to call the REST Extension APIs to test
the ACID rollbacks. => https://github.com/garyrusso/ACIDTester







---- -Original Message-----
From: general-bounces-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org
[mailto:general-bounces-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org] On Behalf Of John Snelson
Sent: Tuesday, October 07, 2014 5:29 AM
To: general-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org
Subject: Re: [MarkLogic Dev General] How to call xdmp:eval() with
transaction-id option?



Grep the server Modules and Apps directory to find uses of it. You probably
also want to find out about xdmp:transaction-create() as well.



Maybe it's time we documented this functionality - I 've seen lots of people
wanting to use it recently.



John



On 06/10/14 23:55, Gary Russo wrote:

> I'm creating a set of REST extensions that will be used in

> Mulit-Statement ACID Transactions.

>

> The underlying code will use xdmp:eval() with the transaction-id option.

>

> Unfortunately, the transaction-id option is undocumented.

>

> Can someone please provide an example of using xdmp:eval() with a

> transaction-id option?

>

>

> Here's the RESTful APIs that are being created.

>

> 1. POST /transaction                                                    (:

> Returns a transaction-id. e.g., 11111111 :) 2. GET

> /inventory?rs:type= artichoke&rs:transId=11111111

> 3. GET /inventory?rs:type=bongo&rs:transId=11111111

> 4. PUT

> /inventory?rs:type=artichoke&rs:transId=11111111&rs:action=decr&rs:qua

> ntity=

> 3

> 5. PUT

> /inventory?rs:type=bongo&rs:transId=11111111&rs:action=decr&rs:quantity=1

> 6. POST /order?rs:transId=11111111                    (: Prepares Document

> Insert :)

> 7. PUT /transaction?rs:transId=11111111          (: commits transaction
:)

> 8. DELETE /transaction?rs:transId=11111111    (: rolls back transaction
:)

>

>

> Thanks in advance.

>

> - GR

>

>

> Gary Russo

> Enterprise NoSQL Developer

>&nbs p; <http://garyrusso.wordpress.com> http://garyrusso.wordpress.com

>

>

>

> _______________________________________________

> General mailing list

>  <mailto:General-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org> General-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org

>  <http://developer.marklogic.com/mailman/listinfo/general>
http://developer.marklogic.com/mailman/listinfo/general

>





--

John Snelson, Lead Engineer                    <http://twitter.com/jpcs>
http://twitter.com/jpcs

MarkLogic Corporation                          <http://www.marklogic.com>
http://www.marklogic.com

_______________________________________________

General mailing list

<mailto:General-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org> General <at> d eveloper.marklogic.com

<http://developer.marklogic.com/mailman/listinfo/general>
http://developer.marklogic.com/mailman/listinfo/general

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20141014/afc249ec/attachment-0001.html

------------------------------

Message: 3
Date: Tue, 14 Oct 2014 12:20:46 -0400
From: Gary Russo <garyrusso-PkbjNfxxIARBDgjK7y7TUQ@public.gmane.org>
Subject: [MarkLogi c Dev General] How to optimize the REST API Bulk
    Ingestion    Performance?
To: <general-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org>
Message-ID: <BLU182-DS13EC6B2F241B5CAF68D26EAAAD0-MsuGFMq8XAE@public.gmane.org>
Content-Type: text/plain; charset="us-ascii"

MarkLogic Bulk ingestion processing is slower than an equivalent Oracle
ingestion process.



The MarkLogic ingestion takes 30 minutes. An Oracle equivalent only takes 7
minutes.



I'm using the REST API to bulk ingest multiple documents as described here.
=> http://docs.marklogic.com/guide/r est-dev/bulk#id_54649



Notes:

.        C# code is used to call the MarkLogic Bulk Ingest REST API.

.        Document batch size used is 500.

.        Average doc size is 1 KB.

.        JSON Conversion and Validation logic occurs in the C# code.





Any thoughts on how to optimize the MarkLogic bulk ingest to make it as fast
as Oracle's 7 minute load time?





Thanks,

Gary R





Gary Russo

Enterprise NoSQL Developer

http://garyrusso.wordpress.com



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://de veloper.marklogic.com/pipermail/general/attachments/20141014/acdda08a/attachment-0001.html

------------------------------

Message: 4
Date: Tue, 14 Oct 2014 12:58:24 -0400
From: Danny Sinang <d.sinang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Subject: Re: [MarkLogic Dev General] Determining ID of database on
    another    cluster
To: general <General-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org>
Message-ID:
    <CAPKs-U+saScUtgZw9dEvGDCT44=6aoyAWfeB_N10hHQPCKVZ_Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
Content-Type: text/plain; charset="utf-8"

Figured it out.

Had to call xdmp:foreign-cluster-status() and extract the databa se id from
the list of databases it returns.

Code below is based mostly from
*/opt/MarkLogic/Admin/lib/dbrep-configure-2-form.xqy* .

Regards,
Danny

==============================================================

xquery version "1.0-ml";

import module namespace admin = "http://marklogic.com/xdmp/admin"
      at "/MarkLogic/admin.xqy";

declare namespace hs="http://marklogic.com/xdmp/status/host";
declare namespace fc="http://marklogic.com/xdmp/status/foreign-cluster";

declare function local:get-local-bootstrap-host($config){
    let $bootstrap-hosts := admin:cluster-get-xdqp-bootstrap-hosts($config)
    return
  &nbs p;     if(fn:count($bootstrap-hosts) eq 0) then
            fn:error((),"ADMIN-NOBOOTSTRAPHOSTCONFIGURED",())
        else
            let $first-available-host :=
(xdmp:host-status($bootstrap-hosts)[fn:not(fn:exists(hs:error))]/hs:host-id)[1]
            return
                if(fn:exists($first-available-host)) then
                  $first-available-host
                else
                    fn:error((),"ADMIN-NOBOOTSTRAPHOSTONLINE",())
};

let $config := admin:get-configuration()
let $foreign-cluster-id := admin:cluster-get-foreign-cluster-id($config,
"my-dev-cluster")
let $host-id := local:get-local-boots trap-host($config)
let $fc-status := xdmp:foreign-cluster-status($host-id,
xs:unsignedLong($foreign-cluster-id))
return

$fc-status/fc:foreign-databases/fc:foreign-database[fc:foreign-database-name
eq 'my-database']

On Tue, Oct 14, 2014 at 11:16 AM, Danny Sinang <d.sinang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> I'm trying to write an XQUERY script to configure our production cluster
> (running ML 7.0-3) to replicate some of its databases to a new dev cluster.
>
> My plan is to loop through the names of the production databases that I
> want replicated and call :
>
> *1. admin:database-foreign-replica($foreign-cluster-id, $foreign-db-id) *
> and
> *2. admin:database-set-foreign-replicas () *
>
> for each database.
>
> Question is, how do I determine *$foreign-db-id* at runtime ? I know
> before-hand its name (the same as the production database name), but I
> can't seem to find the right function to use to  get the ID of that
> "foreign" database.
>
> Any ideas ?
>
> There is of course the option of me running a script on the dev cluster to
> generated the db IDs and copying them over to my master cluster script as
> hardcoded values, but I'm trying to avoid hardcoding as much as possible.
>
> Regards,
> Danny
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20141014/bdcc501c/attachment.html a>

------------------------------

_______________________________________________
General mailing list
General-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org
http://developer.marklogic.com/mailman/listinfo/general


End of General Digest, Vol 124, Issue 30
****************************************


<div><div>
<div><span>The underlying hardware (disk, volume, logical, striping, pipe) is comparable? Other than bulk ingest, ML speed is acceptable?<br></span></div>
<div><br></div>  <div> <div> <div dir="ltr">  <span>From:</span> "general-request@..." &lt;general-request@...&gt;<br><span>To:</span> general@... <br><span>Sent:</span> Tuesday, October 14, 2014 12:58 PM<br><span>Subject:</span> General Digest, Vol 124, Issue 30<br> </div> <div class="y_msg_container">
<br>Send General mailing list submissions to<br>&nbsp;&nbsp;&nbsp; <a ymailto="mailto:general@..." href="mailto:general@...">general@...</a><br><br>To subscribe or unsubscribe via the World Wide Web, visit<br>&nbsp;&nbsp;&nbsp; <a href="http://developer.marklogic.com/mailman/listinfo/general" target="_blank">http://developer.marklogic.com/mailman/listinfo/general</a><br>or, via email, send a message with subject or body 'help' to<br>&nbsp;&nbsp;&nbsp; <a ymailto="mailto:general-request@..." href="mailto:general-request@...">general-request@...<br><br>You can reach the person managing the list at<br>&nbsp;&nbsp;&nbsp; </a><a ymailto="mailto:general-owner@..." href="mailto:general-owner@...">general-owner@...</a><br><br>When replying, please edit your Subject line so it is more specific<br>than "Re: Contents of General digest..."<br><br><br>Today's Topics:<br><br>&nbsp;  1. Determining ID of database on another&nbsp;&nbsp;&nbsp; cluster (Danny Sinang)<br>&nbsp;  2. Re: How to call xdmp:eval() with&nbsp;&nbsp;&nbsp; transaction-id option?<br>&nbsp; &nbsp; &nbsp; (Gary Russo)<br>&nbsp;  3. How to optimize the REST API Bulk Ingestion&nbsp;&nbsp;&nbsp; Performance?<br>&nbsp; &nbsp; &nbsp; (Gary Russo)<br>&nbsp;  4. Re: Determining ID of database on another&nbsp;&nbsp;&nbsp; cluster (Danny Sinang)<br><br><br>----------------------------------------------------------------------<br><br>Message: 1<br>Date: Tue, 
 14 Oct 2014 11:16:51 -0400<br>From: Danny Sinang &lt;<a ymailto="mailto:d.sinang@..." href="mailto:d.sinang@...">d.sinang@...</a>&gt;<br>Subject:
 [MarkLogic Dev General] Determining ID of database on another<br>&nbsp;&nbsp;&nbsp; cluster<br>To: general &lt;<a ymailto="mailto:General@..." href="mailto:General@...">General@...</a>&gt;<br>Message-ID:<br>&nbsp;&nbsp;&nbsp; &lt;CAPKs-UKbrcQpB90W7DE+4++Ga-ToHYHuWO+<a ymailto="mailto:zpA07KQwTqKLOtQ@..." href="mailto:zpA07KQwTqKLOtQ@...">zpA07KQwTqKLOtQ@...</a>&gt;<br>Content-Type: text/plain; charset="utf-8"<br><br>I'm trying to write an XQUERY script to configure our production cluster<br>(running ML 7.0-3) to replicate some of its databases to a new dev cluster.<br><br>My plan is to loop through the names of the production databases that I<br>want replicated and call :<br><br>*1. admin:database
 -foreign-replica($foreign-cluster-id, $foreign-db-id) *and<br><br>*2. admin:database-set-foreign-replicas () *<br><br>for each database.<br><br>Question is, how do I
 determine *$foreign-db-id* at runtime ? I know<br>before-hand its name (the same as the production database name), but I<br>can't seem to find the right function to use to&nbsp; get the ID of that<br>"foreign" database.<br><br>Any ideas ?<br><br>There is of course the option of me running a script on the dev cluster to<br>generated the db IDs and copying them over to my master cluster script as<br>hardcoded values, but I'm trying to avoid hardcoding as much as possible.<br><br>Regards,<br>Danny<br>-------------- next part --------------<br>An HTML attachment was scrubbed...<br>URL: <a href="http://developer.marklogic.com/pipermail/general/attachments/20141014/2c2eb45a/attachment-0001.html" target="_blank">http://developer.marklogic.com/pipermail/general/attachments/20141014/2c2eb45a/attac
 hment-0001.html </a><br><br>------------------------------<br><br>Message: 2<br>Date: Tue, 14 Oct 2014 11:48:33 -0400<br>From: Gary Russo &lt;<a ymailto="mailto:garyrusso@..." href="mailto:garyrusso@...">garyrusso@...</a>&gt;<br>Subject: Re: [MarkLogic Dev General] How to call xdmp:eval() with<br>&nbsp;&nbsp;&nbsp; transaction-id option?<br>To: "'MarkLogic Developer Discussion'"<br>&nbsp;&nbsp;&nbsp; &lt;<a ymailto="mailto:general@..." href="mailto:general@...">general@...</a>&gt;<br>Message-ID: &lt;<a ymailto="mailto:BLU182-DS2590858171305BA8AD4142AAAD0@..." href="mailto:BLU182-DS2590858171305BA8AD4142AAAD0@...">BLU182-DS2590858171305BA8AD4142AAAD0@...</a>&gt;<br>Content-Type: text/plain; charset="us-ascii"<br><br>Thanks John<br><br><br><br>I'm using xdmp:transaction-create() with the update mode option.<br><br><br><br>xdmp:transacti
 on-create(<br><br>&nbsp; &lt;options xmlns="xdmp:eval"&gt;<br><br>&nbsp; &nbsp; &lt;transaction-mode&gt;update&lt;/transaction-mode&gt;<br><br>&nbsp;
 &lt;/options&gt;<br><br><br><br><br><br>I'm using this xdmp:eval() with the transaction-id option.<br><br><br><br>xdmp:eval(<br><br>&nbsp; $evalCmd,<br><br>&nbsp; (xs:QName("uri"), $uri),<br><br>&nbsp; &lt;options xmlns="xdmp:eval"&gt;<br><br>&nbsp; &nbsp; &lt;transaction-id&gt;{$longTxId}&lt;/transaction-id&gt;<br><br>&nbsp; &lt;/options&gt;<br><br><br><br><br><br>I posted the REST Extension Code here. =&gt;<br><a href="https://github.com/garyrusso/GLM-Search" target="_blank">https://github.com/garyrusso/GLM-Search</a><br><br><br><br>I also posted a C# tool that I use to call the REST Extension APIs to test<br>the ACID rollbacks. =&gt; <a href="https://github.com/garyrusso/ACIDTester" target="_blank">https://github.com/garyrusso/ACIDTester</a><br><br><br><br><br><br><br><br>----
 -Original Message-----<br>From: <a ymailto="mailto:general-bounces@..." href="mailto:general-bounces@...">general-bounces@...</a><br>[mailto:<a ymailto="mailto:general-bounces@..." href="mailto:general-bounces@...">general-bounces@...</a>] On Behalf Of John Snelson<br>Sent: Tuesday, October 07, 2014 5:29 AM<br>To: <a ymailto="mailto:general@..." href="mailto:general@...">general@...</a><br>Subject: Re: [MarkLogic Dev General] How to call xdmp:eval() with<br>transaction-id option?<br><br><br><br>Grep the server Modules and Apps directory to find uses of it. You probably<br>also want to find out about xdmp:transaction-create() as well.<br><br><br><br>Maybe it's time we documented this functionality - I
 've seen lots of people<br>wanting to use it recently.<br><br><br><br>John<br><br><br><br>On 06/10/14 23:55, Gary Russo wrote:<br><br>&gt; I'm creating a set of REST
 extensions that will be used in <br><br>&gt; Mulit-Statement ACID Transactions.<br><br>&gt; <br><br>&gt; The underlying code will use xdmp:eval() with the transaction-id option.<br><br>&gt; <br><br>&gt; Unfortunately, the transaction-id option is undocumented.<br><br>&gt; <br><br>&gt; Can someone please provide an example of using xdmp:eval() with a <br><br>&gt; transaction-id option?<br><br>&gt; <br><br>&gt; <br><br>&gt; Here's the RESTful APIs that are being created.<br><br>&gt; <br><br>&gt; 1. POST /transaction&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; (:<br><br>&gt; Returns a transaction-id. e.g., 11111111 :) 2. GET <br><br>&gt; /inventory?rs:type=
 artichoke&amp;rs:transId=11111111<br><br>&gt; 3. GET /inventory?rs:type=bongo&amp;rs:transId=11111111<br><br>&gt; 4. PUT<br><br>&gt;
 /inventory?rs:type=artichoke&amp;rs:transId=11111111&amp;rs:action=decr&amp;rs:qua<br><br>&gt; ntity=<br><br>&gt; 3<br><br>&gt; 5. PUT<br><br>&gt; /inventory?rs:type=bongo&amp;rs:transId=11111111&amp;rs:action=decr&amp;rs:quantity=1<br><br>&gt; 6. POST /order?rs:transId=11111111&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; (: Prepares Document<br><br>&gt; Insert :)<br><br>&gt; 7. PUT /transaction?rs:transId=11111111&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;  (: commits transaction<br>:)<br><br>&gt; 8. DELETE /transaction?rs:transId=11111111&nbsp; &nbsp;  (: rolls back transaction<br>:)<br><br>&gt; <br><br>&gt; <br><br>&gt; Thanks in advance.<br><br>&gt; <br><br>&gt; - GR<br><br>&gt; <br><br>&gt; <br><br>&gt; Gary Russo<br><br>&gt; Enterprise NoSQL Developer<br><br>&gt;&amp;nbs
 p; &lt;<a href="http://garyrusso.wordpress.com/" target="_blank">http://garyrusso.wordpress.com</a>&gt; <a href="http://garyrusso.wordpress.com/" target="_blank">http://garyrusso.wordpress.com</a><br><br>&gt; <br><br>&gt; <br><br>&gt; <br><br>&gt; _______________________________________________<br><br>&gt; General mailing list<br><br>&gt;&nbsp; &lt;mailto:<a ymailto="mailto:General@..." href="mailto:General@...">General@...</a>&gt; <a ymailto="mailto:General@..." href="mailto:General@...">General@...</a><br><br>&gt;&nbsp; &lt;<a href="http://developer.marklogic.com/mailman/listinfo/general" target="_blank">http://developer.marklogic.com/mailman/listinfo/general</a>&gt;<br><a href="http://developer.marklogic.com/mailman/listinfo/general" target="_blank">http://developer.marklogic.com/mailman/listinfo/general</a><br><br>&gt; <br><br><br><br><br><br>-- <br><br>John Snelson, Lead Engineer&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;  &lt;<a href="http://twitter.com/jpcs" target="_blank">http://twitter.com/jpcs</a>&gt;<br><a href="http://twitter.com/jpcs" target="_blank">http://twitter.com/jpcs</a><br><br>MarkLogic Corporation&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;<a href="http://www.marklogic.com/" target="_blank">http://www.marklogic.com</a>&gt;<br><a href="http://www.marklogic.com/" target="_blank">http://www.marklogic.com</a><br><br>_______________________________________________<br><br>General mailing list<br><br> &lt;mailto:<a ymailto="mailto:General@..." href="mailto:General@...">General@...</a>&gt; <a ymailto="mailto:General@..." href="mailto:General@...">General <at> d
 eveloper.marklogic.com</a><br><br> &lt;<a href="http://developer.marklogic.com/mailman/listinfo/general" target="_blank">http://developer.marklogic.com/mailman/listinfo/general</a>&gt;<br><a href="http://developer.marklogic.com/mailman/listinfo/general" target="_blank">http://developer.marklogic.com/mailman/listinfo/general</a><br><br>-------------- next part --------------<br>An HTML attachment was scrubbed...<br>URL: <a href="http://developer.marklogic.com/pipermail/general/attachments/20141014/afc249ec/attachment-0001.html" target="_blank">http://developer.marklogic.com/pipermail/general/attachments/20141014/afc249ec/attachment-0001.html </a><br><br>------------------------------<br><br>Message: 3<br>Date: Tue, 14 Oct 2014 12:20:46 -0400<br>From: Gary Russo &lt;<a ymailto="mailto:garyrusso@..." href="mailto:garyrusso@...">garyrusso@...</a>&gt;<br>Subject: [MarkLogi
 c Dev General] How to optimize the REST API Bulk<br>&nbsp;&nbsp;&nbsp; Ingestion&nbsp;&nbsp;&nbsp; Performance?<br>To: &lt;<a ymailto="mailto:general@..." href="mailto:general@...">general@...</a>&gt;<br>Message-ID: &lt;<a ymailto="mailto:BLU182-DS13EC6B2F241B5CAF68D26EAAAD0@..." href="mailto:BLU182-DS13EC6B2F241B5CAF68D26EAAAD0@...">BLU182-DS13EC6B2F241B5CAF68D26EAAAD0@...</a>&gt;<br>Content-Type: text/plain; charset="us-ascii"<br><br>MarkLogic Bulk ingestion processing is slower than an equivalent Oracle<br>ingestion process.<br><br><br><br>The MarkLogic ingestion takes 30 minutes. An Oracle equivalent only takes 7<br>minutes.<br><br><br><br>I'm using the REST API to bulk ingest multiple documents as described here.<br>=&gt; <a href="http://docs.marklogic.com/guide/rest-dev/bulk#id_54649" target="_blank">http://docs.marklogic.com/guide/r
 est-dev/bulk#id_54649</a><br><br><br><br>Notes:<br><br>.&nbsp; &nbsp; &nbsp; &nbsp;  C# code is used to call the MarkLogic Bulk Ingest REST API.<br><br>.&nbsp; &nbsp;
 &nbsp; &nbsp;  Document batch size used is 500.<br><br>.&nbsp; &nbsp; &nbsp; &nbsp;  Average doc size is 1 KB.<br><br>.&nbsp; &nbsp; &nbsp; &nbsp;  JSON Conversion and Validation logic occurs in the C# code.<br><br><br><br><br><br>Any thoughts on how to optimize the MarkLogic bulk ingest to make it as fast<br>as Oracle's 7 minute load time?<br><br><br><br><br><br>Thanks,<br><br>Gary R<br><br><br><br><br><br>Gary Russo<br><br>Enterprise NoSQL Developer<br><br><a href="http://garyrusso.wordpress.com/" target="_blank">http://garyrusso.wordpress.com</a><br><br><br><br>-------------- next part --------------<br>An HTML attachment was scrubbed...<br>URL: <a href="http://developer.marklogic.com/pipermail/general/attachments/20141014/acdda08a/attachment-0001.html" target="_blank">http://de
 veloper.marklogic.com/pipermail/general/attachments/20141014/acdda08a/attachment-0001.html </a><br><br>------------------------------<br><br>Message: 4<br>Date: Tue, 14
 Oct 2014 12:58:24 -0400<br>From: Danny Sinang &lt;<a ymailto="mailto:d.sinang@..." href="mailto:d.sinang@...">d.sinang@...</a>&gt;<br>Subject: Re: [MarkLogic Dev General] Determining ID of database on<br>&nbsp;&nbsp;&nbsp; another&nbsp;&nbsp;&nbsp; cluster<br>To: general &lt;<a ymailto="mailto:General@..." href="mailto:General@...">General@...</a>&gt;<br>Message-ID:<br>&nbsp;&nbsp;&nbsp; &lt;CAPKs-U+saScUtgZw9dEvGDCT44=<a ymailto="mailto:6aoyAWfeB_N10hHQPCKVZ_Q@..." href="mailto:6aoyAWfeB_N10hHQPCKVZ_Q@...">6aoyAWfeB_N10hHQPCKVZ_Q@...</a>&gt;<br>Content-Type: text/plain; charset="utf-8"<br><br>Figured it out.<br><br>Had to call xdmp:foreign-cluster-status() and extract the databa
 se id from<br>the list of databases it returns.<br><br>Code below is based mostly from<br>*/opt/MarkLogic/Admin/lib/dbrep-configure-2-form.xqy*
 .<br><br>Regards,<br>Danny<br><br>==============================================================<br><br>xquery version "1.0-ml";<br><br>import module namespace admin = "<a href="http://marklogic.com/xdmp/admin" target="_blank">http://marklogic.com/xdmp/admin</a>"<br>&nbsp; &nbsp; &nbsp; at "/MarkLogic/admin.xqy";<br><br>declare namespace hs="<a href="http://marklogic.com/xdmp/status/host" target="_blank">http://marklogic.com/xdmp/status/host</a>";<br>declare namespace fc="<a href="http://marklogic.com/xdmp/status/foreign-cluster" target="_blank">http://marklogic.com/xdmp/status/foreign-cluster</a>";<br><br>declare function local:get-local-bootstrap-host($config){<br>&nbsp; &nbsp; let $bootstrap-hosts := admin:cluster-get-xdqp-bootstrap-hosts($config)<br>&nbsp; &nbsp; return<br>&nbsp; &amp;nbs
 p; &nbsp; &nbsp; if(fn:count($bootstrap-hosts) eq 0) then<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; fn:error((),"ADMIN-NOBOOTSTRAPHOSTCONFIGURED",())<br>&nbsp;
 &nbsp; &nbsp; &nbsp; else<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; let $first-available-host :=<br>(xdmp:host-status($bootstrap-hosts)[fn:not(fn:exists(hs:error))]/hs:host-id)[1]<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if(fn:exists($first-available-host)) then<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;  $first-available-host<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; fn:error((),"ADMIN-NOBOOTSTRAPHOSTONLINE",())<br>};<br><br>let $config := admin:get-configuration()<br>let $foreign-cluster-id := admin:cluster-get-foreign-cluster-id($config,<br>"my-dev-cluster")<br>let $host-id := local:get-local-boots
 trap-host($config)<br>let $fc-status :=
 xdmp:foreign-cluster-status($host-id,<br>xs:unsignedLong($foreign-cluster-id))<br>return<br><br>$fc-status/fc:foreign-databases/fc:foreign-database[fc:foreign-database-name<br>eq 'my-database']<br><br>On Tue, Oct 14, 2014 at 11:16 AM, Danny Sinang &lt;<a ymailto="mailto:d.sinang@..." href="mailto:d.sinang@...">d.sinang@...</a>&gt; wrote:<br><br>&gt; I'm trying to write an XQUERY script to configure our production cluster<br>&gt; (running ML 7.0-3) to replicate some of its databases to a new dev cluster.<br>&gt;<br>&gt; My plan is to loop through the names of the production databases that I<br>&gt; want replicated and call :<br>&gt;<br>&gt; *1. admin:database-foreign-replica($foreign-cluster-id, $foreign-db-id) *<br>&gt; and<br>&gt; *2. admin:database-set-foreign-replicas
  () *<br>&gt;<br>&gt; for each database.<br>&gt;<br>&gt; Question is, how do I determine *$foreign-db-id* at runtime ? I know<br>&gt; before-hand its name (the same as
 the production database name), but I<br>&gt; can't seem to find the right function to use to&nbsp; get the ID of that<br>&gt; "foreign" database.<br>&gt;<br>&gt; Any ideas ?<br>&gt;<br>&gt; There is of course the option of me running a script on the dev cluster to<br>&gt; generated the db IDs and copying them over to my master cluster script as<br>&gt; hardcoded values, but I'm trying to avoid hardcoding as much as possible.<br>&gt;<br>&gt; Regards,<br>&gt; Danny<br>&gt;<br>&gt;<br>&gt;<br>-------------- next part --------------<br>An HTML attachment was scrubbed...<br>URL: <a href="http://developer.marklogic.com/pipermail/general/attachments/20141014/bdcc501c/attachment.html" target="_blank">http://developer.marklogic.com/pipermail/general/attachments/20141014/bdcc501c/attachment.html 
 a&gt;<br><br>------------------------------<br><br>_______________________________________________<br>General mailing list<br></a><a ymailto="mailto:General@..." href="mailto:General@...">General@...</a><br><a href="http://developer.marklogic.com/mailman/listinfo/general" target="_blank">http://developer.marklogic.com/mailman/listinfo/general</a><br><br><br>End of General Digest, Vol 124, Issue 30<br>****************************************<br><br><br>
</div> </div> </div>  </div></div>
Gary Russo | 14 Oct 17:48 2014
Picon

Re: [MarkLogic Dev General] How to call xdmp:eval() with transaction-id option?

Thanks John

 

I'm using xdmp:transaction-create() with the update mode option.

 

xdmp:transaction-create(

  <options xmlns="xdmp:eval">

    <transaction-mode>update</transaction-mode>

  </options>

 

 

I'm using this xdmp:eval() with the transaction-id option.

 

xdmp:eval(

  $evalCmd,

  (xs:QName("uri"), $uri),

  <options xmlns="xdmp:eval">

    <transaction-id>{$longTxId}</transaction-id>

  </options>

 

 

I posted the REST Extension Code here. => https://github.com/garyrusso/GLM-Search

 

I also posted a C# tool that I use to call the REST Extension APIs to test the ACID rollbacks. => https://github.com/garyrusso/ACIDTester

 

 

 

-----Original Message-----
From: general-bounces-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org [mailto:general-bounces-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org] On Behalf Of John Snelson
Sent: Tuesday, October 07, 2014 5:29 AM
To: general-ld4jwAGwUXTgXEvjvSGRgMKenhbt+owO@public.gmane.org
Subject: Re: [MarkLogic Dev General] How to call xdmp:eval() with transaction-id option?

 

Grep the server Modules and Apps directory to find uses of it. You probably also want to find out about xdmp:transaction-create() as well.

 

Maybe it's time we documented this functionality - I've seen lots of people wanting to use it recently.

 

John

 

On 06/10/14 23:55, Gary Russo wrote:

> I'm creating a set of REST extensions that will be used in

> Mulit-Statement ACID Transactions.

>

 

> The underlying code will use xdmp:eval() with the transaction-id option.

>

 

> Unfortunately, the transaction-id option is undocumented.

>

 

> Can someone please provide an example of using xdmp:eval() with a

> transaction-id option?

>

 

>

 

> Here's the RESTful APIs that are being created.

>

 

> 1. POST /transaction                                                    (:

> Returns a transaction-id. e.g., 11111111 :) 2. GET

> /inventory?rs:type=artichoke&rs:transId=11111111

> 3. GET /inventory?rs:type=bongo&rs:transId=11111111

> 4. PUT

> /inventory?rs:type=artichoke&rs:transId=11111111&rs:action=decr&rs:qua

> ntity=

> 3

> 5. PUT

> /inventory?rs:type=bongo&rs:transId=11111111&rs:action=decr&rs:quantity=1

> 6. POST /order?rs:transId=11111111                    (: Prepares Document

> Insert :)

> 7. PUT /transaction?rs:transId=11111111           (: commits transaction :)

> 8. DELETE /transaction?rs:transId=11111111     (: rolls back transaction :)

>

 

>

 

> Thanks in advance.

>

 

> - GR

>

 

>

 

> Gary Russo

> Enterprise NoSQL Developer

> http://garyrusso.wordpress.com

>

 

>

 

>

 

> _______________________________________________

> General mailing list

> General-ld4jwAGwUXTgXEvjvSGRgNi2O/JbrIOy@public.gmane.orgic.com

> http://developer.marklogic.com/mailman/listinfo/general

>

 

 

 

--

John Snelson, Lead Engineer                    http://twitter.com/jpcs

MarkLogic Corporation                         http://www.marklogic.com

_______________________________________________

General mailing list

General-ld4jwAGwUXTgXEvjvSGRgNi2O/JbrIOy@public.gmane.orgic.com

http://developer.marklogic.com/mailman/listinfo/general

<div><div class="WordSection1">
<p class="MsoPlainText">Thanks John<p></p></p>
<p class="MsoPlainText"><p>&nbsp;</p></p>
<p class="MsoNormal">I'm using xdmp:transaction-create() with the update mode option.<p></p></p>
<p class="MsoPlainText"><p>&nbsp;</p></p>
<p class="MsoPlainText">xdmp:transaction-create(<p></p></p>
<p class="MsoPlainText">&nbsp; &lt;options xmlns="xdmp:eval"&gt;<p></p></p>
<p class="MsoPlainText">&nbsp;&nbsp;&nbsp; &lt;transaction-mode&gt;update&lt;/transaction-mode&gt;<p></p></p>
<p class="MsoPlainText">&nbsp; &lt;/options&gt;<p></p></p>
<p class="MsoPlainText"><p>&nbsp;</p></p>
<p class="MsoPlainText"><p>&nbsp;</p></p>
<p class="MsoNormal">I'm using this xdmp:eval() with the transaction-id option.<p></p></p>
<p class="MsoPlainText"><p>&nbsp;</p></p>
<p class="MsoPlainText">xdmp:eval(<p></p></p>
<p class="MsoPlainText">&nbsp; $evalCmd,<p></p></p>
<p class="MsoPlainText">&nbsp; (xs:QName("uri"), $uri),<p></p></p>
<p class="MsoPlainText">&nbsp; &lt;options xmlns="xdmp:eval"&gt;<p></p></p>
<p class="MsoPlainText">&nbsp;&nbsp;&nbsp; &lt;transaction-id&gt;{$longTxId}&lt;/transaction-id&gt;<p></p></p>
<p class="MsoPlainText">&nbsp; &lt;/options&gt;<p></p></p>
<p class="MsoPlainText"><p>&nbsp;</p></p>
<p class="MsoPlainText"><p>&nbsp;</p></p>
<p class="MsoNormal">I posted the REST Extension Code here. =&gt; <a href="https://github.com/garyrusso/GLM-Search">https://github.com/garyrusso/GLM-Search</a><p></p></p>
<p class="MsoPlainText"><p>&nbsp;</p></p>
<p class="MsoPlainText">I also posted a C# tool that I use to call the REST Extension APIs to test the ACID rollbacks. =&gt; <a href="https://github.com/garyrusso/ACIDTester">https://github.com/garyrusso/ACIDTester</a><p></p></p>
<p class="MsoPlainText"><p>&nbsp;</p></p>
<p class="MsoPlainText"><p>&nbsp;</p></p>
<p class="MsoPlainText"><p>&nbsp;</p></p>
<p class="MsoPlainText">-----Original Message-----<br>From: general-bounces@... [mailto:general-bounces@...] On Behalf Of John Snelson<br>Sent: Tuesday, October 07, 2014 5:29 AM<br>To: general@...<br>Subject: Re: [MarkLogic Dev General] How to call xdmp:eval() with transaction-id option?</p>
<p class="MsoPlainText"><p>&nbsp;</p></p>
<p class="MsoPlainText">Grep the server Modules and Apps directory to find uses of it. You probably also want to find out about xdmp:transaction-create() as well.<p></p></p>
<p class="MsoPlainText"><p>&nbsp;</p></p>
<p class="MsoPlainText">Maybe it's time we documented this functionality - I've seen lots of people wanting to use it recently.<p></p></p>
<p class="MsoPlainText"><p>&nbsp;</p></p>
<p class="MsoPlainText">John<p></p></p>
<p class="MsoPlainText"><p>&nbsp;</p></p>
<p class="MsoPlainText">On 06/10/14 23:55, Gary Russo wrote:<p></p></p>
<p class="MsoPlainText">&gt; I'm creating a set of REST extensions that will be used in <p></p></p>
<p class="MsoPlainText">&gt; Mulit-Statement ACID Transactions.<p></p></p>
<p class="MsoPlainText">&gt;<p>&nbsp;</p></p>
<p class="MsoPlainText">&gt; The underlying code will use xdmp:eval() with the transaction-id option.<p></p></p>
<p class="MsoPlainText">&gt;<p>&nbsp;</p></p>
<p class="MsoPlainText">&gt; Unfortunately, the transaction-id option is undocumented.<p></p></p>
<p class="MsoPlainText">&gt;<p>&nbsp;</p></p>
<p class="MsoPlainText">&gt; Can someone please provide an example of using xdmp:eval() with a <p></p></p>
<p class="MsoPlainText">&gt; transaction-id option?<p></p></p>
<p class="MsoPlainText">&gt;<p>&nbsp;</p></p>
<p class="MsoPlainText">&gt;<p>&nbsp;</p></p>
<p class="MsoPlainText">&gt; Here's the RESTful APIs that are being created.<p></p></p>
<p class="MsoPlainText">&gt;<p>&nbsp;</p></p>
<p class="MsoPlainText">&gt; 1. POST /transaction&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (:<p></p></p>
<p class="MsoPlainText">&gt; Returns a transaction-id. e.g., 11111111 :) 2. GET <p></p></p>
<p class="MsoPlainText">&gt; /inventory?rs:type=artichoke&amp;rs:transId=11111111<p></p></p>
<p class="MsoPlainText">&gt; 3. GET /inventory?rs:type=bongo&amp;rs:transId=11111111<p></p></p>
<p class="MsoPlainText">&gt; 4. PUT<p></p></p>
<p class="MsoPlainText">&gt; /inventory?rs:type=artichoke&amp;rs:transId=11111111&amp;rs:action=decr&amp;rs:qua<p></p></p>
<p class="MsoPlainText">&gt; ntity=<p></p></p>
<p class="MsoPlainText">&gt; 3<p></p></p>
<p class="MsoPlainText">&gt; 5. PUT<p></p></p>
<p class="MsoPlainText">&gt; /inventory?rs:type=bongo&amp;rs:transId=11111111&amp;rs:action=decr&amp;rs:quantity=1<p></p></p>
<p class="MsoPlainText">&gt; 6. POST /order?rs:transId=11111111&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (: Prepares Document<p></p></p>
<p class="MsoPlainText">&gt; Insert :)<p></p></p>
<p class="MsoPlainText">&gt; 7. PUT /transaction?rs:transId=11111111&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (: commits transaction :)<p></p></p>
<p class="MsoPlainText">&gt; 8. DELETE /transaction?rs:transId=11111111&nbsp;&nbsp;&nbsp;&nbsp; (: rolls back transaction :)<p></p></p>
<p class="MsoPlainText">&gt;<p>&nbsp;</p></p>
<p class="MsoPlainText">&gt;<p>&nbsp;</p></p>
<p class="MsoPlainText">&gt; Thanks in advance.<p></p></p>
<p class="MsoPlainText">&gt;<p>&nbsp;</p></p>
<p class="MsoPlainText">&gt; - GR<p></p></p>
<p class="MsoPlainText">&gt;<p>&nbsp;</p></p>
<p class="MsoPlainText">&gt;<p>&nbsp;</p></p>
<p class="MsoPlainText">&gt; Gary Russo<p></p></p>
<p class="MsoPlainText">&gt; Enterprise NoSQL Developer<p></p></p>
<p class="MsoPlainText">&gt; <a href="http://garyrusso.wordpress.com"><span>http://garyrusso.wordpress.com</span></a><p></p></p>
<p class="MsoPlainText">&gt;<p>&nbsp;</p></p>
<p class="MsoPlainText">&gt;<p>&nbsp;</p></p>
<p class="MsoPlainText">&gt;<p>&nbsp;</p></p>
<p class="MsoPlainText">&gt; _______________________________________________<p></p></p>
<p class="MsoPlainText">&gt; General mailing list<p></p></p>
<p class="MsoPlainText">&gt; <a href="mailto:General@..."><span>General@...ic.com</span></a><p></p></p>
<p class="MsoPlainText">&gt; <a href="http://developer.marklogic.com/mailman/listinfo/general"><span>http://developer.marklogic.com/mailman/listinfo/general</span></a><p></p></p>
<p class="MsoPlainText">&gt;<p>&nbsp;</p></p>
<p class="MsoPlainText"><p>&nbsp;</p></p>
<p class="MsoPlainText"><p>&nbsp;</p></p>
<p class="MsoPlainText">-- <p></p></p>
<p class="MsoPlainText">John Snelson, Lead Engineer&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a href="http://twitter.com/jpcs"><span>http://twitter.com/jpcs</span></a><p></p></p>
<p class="MsoPlainText">MarkLogic Corporation&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a href="http://www.marklogic.com"><span>http://www.marklogic.com</span></a><p></p></p>
<p class="MsoPlainText">_______________________________________________<p></p></p>
<p class="MsoPlainText">General mailing list<p></p></p>
<p class="MsoPlainText"><a href="mailto:General@..."><span>General@...ic.com</span></a><p></p></p>
<p class="MsoPlainText"><a href="http://developer.marklogic.com/mailman/listinfo/general"><span>http://developer.marklogic.com/mailman/listinfo/general</span></a><p></p></p>
</div></div>
Danny Sinang | 14 Oct 17:16 2014
Picon

[MarkLogic Dev General] Determining ID of database on another cluster

I'm trying to write an XQUERY script to configure our production cluster (running ML 7.0-3) to replicate some of its databases to a new dev cluster.

My plan is to loop through the names of the production databases that I want replicated and call :

1. admin:database-foreign-replica($foreign-cluster-id, $foreign-db-id) and 
2. admin:database-set-foreign-replicas () 

for each database.

Question is, how do I determine $foreign-db-id at runtime ? I know before-hand its name (the same as the production database name), but I can't seem to find the right function to use to  get the ID of that "foreign" database.

Any ideas ?

There is of course the option of me running a script on the dev cluster to generated the db IDs and copying them over to my master cluster script as hardcoded values, but I'm trying to avoid hardcoding as much as possible.

Regards,
Danny


<div><div dir="ltr">I'm trying to write an XQUERY script to configure our production cluster (running ML 7.0-3) to replicate some of its databases to a new dev cluster.<div><br></div>
<div>My plan is to loop through the names of the production databases that I want replicated and call :</div>
<div><span><br></span></div>
<div><span>1. admin:database-foreign-replica($foreign-cluster-id, $foreign-db-id) and&nbsp;</span></div>
<div>2. admin:database-set-foreign-replicas ()&nbsp;</div>
<div><br></div>
<div>for each database.</div>
<div><br></div>
<div>Question is, how do I determine $foreign-db-id at runtime ? I know before-hand its name (the same as the production database name), but I can't seem to find the right function to use to &nbsp;get the ID of that "foreign" database.</div>
<div><br></div>
<div>Any ideas ?</div>
<div><br></div>
<div>There is of course the option of me running a script on the dev cluster to generated the db IDs and copying them over to my master cluster script as hardcoded values, but I'm trying to avoid hardcoding as much as possible.</div>
<div><br></div>
<div>Regards,</div>
<div>Danny</div>
<div><br></div>
<div><br></div>
</div></div>
ville | 13 Oct 09:57 2014
Picon

[MarkLogic Dev General] Adding new fields

Hi,
 
when developing applications with ML as the database, we need to add new indexes regularly to deliver new features. We often (probably 95%) of the time add new indexes that will not hit any content in the database currently, but know that eventually will when new content is added.
 
As we have terabytes / millions of docs of content, these reindex operations can be costly and take considerable time to run.
 
So finally to the question: given that we're adding a new field that has one include, it seems that ML goes through all documents in the database (include limits by element and attribute value) - is there a way to tell ML that hey, we know, and we take the responsibility, that the database currently does not have any content that needs to be reindex, so even though the database wide "reindexer enable" is on, please do not do any reindexing for this field?
 
Would it work to toggle reindexer enable off while adding the fields, and then toggling it back on. What about new documents added during reindexer is off? (We don't have the luxury to stop writes at any given time.)
 
Ville
<div>
<div>Hi,</div>
<div>&nbsp;</div>
<div>when developing applications with ML as the database, we need to add new indexes regularly to deliver new features. We often (probably 95%) of the time add new indexes that will not hit any content in the database currently, but know that eventually will when new content is added.</div>
<div>&nbsp;</div>
<div>As we have terabytes / millions of docs&nbsp;of content, these reindex operations can be costly and take considerable time to run. </div>
<div>&nbsp;</div>
<div>So finally to the question: given that we're adding a new field that has one include, it seems that ML goes through all documents in the database (include limits by element and attribute value)&nbsp;- is there a way to tell ML that hey, we know,&nbsp;and we take the responsibility, that the database currently does not have any content that needs to be reindex, so even though the database wide "<span>reindexer enable" </span>is on, please do not do any reindexing for this field?</div>
<div>&nbsp;</div>
<div>Would it work to toggle reindexer enable off while adding the fields, and then toggling it back on. What about new documents added during reindexer is off? (We don't have the luxury to stop writes at any given time.)</div>
<div>&nbsp;</div>
<div>Ville</div>
</div>

Gmane