John,
I was able to pin point that the issue was due to the presence of multiple entries in the dw_loan_schedules table for some accounts, although I am still trying
to figure out why this would give rise to a duplicate entry error. While trying to delete these multiple schedule entries, I stumbled upon a scenario which is not being handled by the ETL at present. I have explained the issue and discussed the fix I developed
below.
·
Upon discussion with my client, I came to know that the changes made to the schedule dates were not because of the stored proc used by them.
·
The changes had occurred due to disbursement being made on a date later than the initial disbursement date.
·
When the disbursement is made on a different date, mifos adjusts the entire schedule according to the new disbursement date.
·
In this process, mifos deletes the existing schedule entries and adds new entries to the loan_schedule table.
·
The ETL currently makes incremental changes to the loan schedule entries by comparing the id in dw_loan_schedules table of the warehouse database
to the id in loan_schedule table of the mifos database.
·
As the old entries have been deleted, the ETL is unable to update this entry and instead, inserts a new entry with the modified schedule date.
·
These multiple entries were later giving rise to duplicate entry error (discussed below) while running the transformation to insert historical balances
and arrears.
·
This issue also gives rise to rows being inserted into the hist_loan_arrears table for these accounts as it considers the old schedule entries for
arrears computation. This is wrong as there are no arrears.
·
The arrears wouldn’t have gotten calculated had the old entries been updated correctly without new entries being inserted.
Fix –
The issue arises primarily due to the fact that the ids from the 2 loan schedule tables do not match in this scenario. Hence, I have ruled out this comparison
by altering the key values to be matched as loan_account_id and installment_id. As this combination remains unaltered even when the id changes, the transformation works fine even in this scenario.
Could you please confirm if this fix would be the right approach to resolve this issue?
I would also like to discuss another constraint I observed while trying to incrementally update the dw_loan_schedules table.
·
The last_updated_date column in the loan_schedule table gets updated/over written for every row even when a modification/updation is done only to
a single installment of the account.
·
For a client with a huge database, it would be inconvenient to populate the warehouse using the live production database. Hence, we take regular backups
and use these in a test server to set up the warehouse.
·
For a backup taken on, say, 23/01/2012, the last_updated_date for a particular loan account was displayed as 19/01/2012 (date closest to the date
when the backup was taken and a change was last made to the account) for all the installments. However, the schedule dates of this account had been modified around 2 months before.
·
I was able to detect this change only when populating the warehouse for parameter date 19/01/2012.
·
This dependency on the last_updated_date column has forced us to use weekly backups to populate data so that changes can be tracked as and when they
had been made.
·
According to my understanding, had the last_updated_date not been modified for all the records, it would have been easier to capture the changes made
installment wise.
Could you please validate the points mentioned above and correct me if my understanding in this case is wrong?
Thanks and regards,
Seetha Rajagopalan • Engineer - Product Development
• Mifos • SunGard • Global Services
Embassy Icon, 3 Infantry Road, Bangalore 560 001, India
Main +91 (80) 3091 3000 X3234
P
Think before you print
CONFIDENTIALITY: This e-mail (including any attachments) may contain confidential, proprietary and privileged information, and unauthorized disclosure or use
is prohibited. If you receive this e-mail in error, please notify the sender and delete this e-mail from your system.
From: John Woodlock [mailto:john.woodlock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org]
Sent: Wednesday, February 01, 2012 6:53 PM
To: Rajagopalan, Seetha
Cc: Bekkinkeri, Chetan
Subject: Re: Require your help in setting up mifos DW
Seetha,
I've looked thru the transformation and I can't see by just inspection how more than one entry per date / account can arise.
I notice you get the duplicate error when inserting into hist_loan_arrears and not hist_loan_balances so I wonder if the 'duplicate' is coming from sub-transformation
"Build Loan Arrears Entry for a Date". If you ran the transformation and disabled that step you could tell whether the duplicates only come from the hist_loan_arrears insertion and not hist_loan_balances. That would narrow down the problem.
John
On Tue, Jan 31, 2012 at 7:53 PM, <Seetha.Rajagopalan-X+QBe+SqkrFBDgjK7y7TUQ@public.gmane.org> wrote:
John,
I am yet to start working on this issue as I have been held up with bringing the warehouse implementation
I am carrying out up to date. I shall send the modified ETL to you once I am done with it.
At the moment, I am facing an error while running the last transformation – Derive Loan Balances
and Arrears. I have pasted the error message below for your reference.
Because of an error, this step can't continue:
org.pentaho.di.core.exception.KettleException:
Error batch inserting rows into table [hist_loan_arrears].
Errors encountered (first 10):
Duplicate entry '2012-01-09-518452' for key 'PRIMARY'
Error updating batch
Duplicate entry '2012-01-09-518452' for key 'PRIMARY'
org.pentaho.di.trans.steps.tableoutput.TableOutput.writeToTable(TableOutput.java:390)
org.pentaho.di.trans.steps.tableoutput.TableOutput.processRow(TableOutput.java:116)
org.pentaho.di.trans.step.RunThread.run(RunThread.java:40)
java.lang.Thread.run(Unknown Source)
I am unable to debug this as my understanding of the execution of the job was that it runs only once
for every date parameter. If my understanding is correct, there is no possibility for a duplicate entry to be inserted during one iteration. I have also ensured that all records having as_of_date > ‘2012-01-08’ have been deleted before this transformation
was run.
I would like to mention here that there are duplicate entries for this account in the dw_loan_schedules
table as the client had used a stored proc to modify the schedule dates resulting in two different entries for every installment id. Could this be the root cause of the issue? According to my understanding, even if such duplicate entries are present in the
loan schedule tables, it should result in erroneous balance and arrear amounts (which I can fix manually) rather than duplicate entry error.
Will you please help me out?
Thanks and regards,
Seetha Rajagopalan