xiaona han | 19 Mar 2011 07:33
Picon

A question about project " Lua bindings for Xapian"

Hello all,

I am interested in the project "Supporting another language(Lua)". It difficulty is medium-hard. Then my question is: How many lines of  code does it need to implement this project? I found some code in other languages binding is generated by SWIG. Then what part of code do we need to write, and what part of code are generated by SWIG?



Best Regards,
               Xiaona

_______________________________________________
Xapian-devel mailing list
Xapian-devel <at> lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
Sumith Matharage | 19 Mar 2011 10:24
Picon

GSOC 2011 : Weighting Schemes

Hi All,


I'm Sumith, a postgraduate student in Monash university. I'm working in the area of Text weighting schemes and Text Mining. When I'm going through the GSOC project list, I felt interested in the 'Weighting Schemes' project. At the moment, I have worked with different weighting schemes as TF-IDF and would love to join and contribute with my ideas in this project. 

Thanks,
Sumith
_______________________________________________
Xapian-devel mailing list
Xapian-devel <at> lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
Olly Betts | 19 Mar 2011 11:38
Favicon
Gravatar

Re: A question about project " Lua bindings for Xapian"

On Sat, Mar 19, 2011 at 02:33:13PM +0800, xiaona han wrote:
> I am interested in the project "Supporting another language(Lua)". It
> difficulty is medium-hard. Then my question is: How many lines of  code does
> it need to implement this project? I found some code in other languages
> binding is generated by SWIG. Then what part of code do we need to write,
> and what part of code are generated by SWIG?

If you look at the code in the version control system (either svn, or
the git mirror) it should be clearer as none of the generated files
are checked in (whereas the tar files we distribute include the
generated files for convenience).

If you look at one of the languages where the wrapping has been done
in a natural way, you should see the sort of thing to aim for.  Python
is probably the best example of this (but ignore the *3.py files which
are mostly generated from the *2.py files by the Python 2to3 tool, with
some manual tweaks to work around issues with that tool).

Cheers,
    Olly
Olly Betts | 19 Mar 2011 11:41
Favicon
Gravatar

Re: GSOC 2011 : Weighting Schemes

On Sat, Mar 19, 2011 at 08:24:21PM +1100, Sumith Matharage wrote:
> I'm Sumith, a postgraduate student in Monash university. I'm working in the
> area of Text weighting schemes and Text Mining. When I'm going through the
> GSOC project list, I felt interested in the 'Weighting Schemes' project. At
> the moment, I have worked with different weighting schemes as TF-IDF and
> would love to join and contribute with my ideas in this project.

OK, sounds like you're well qualified on the theory side there then.
Did you have any particular questions?

Cheers,
    Olly
Praveen Kumar | 19 Mar 2011 12:50
Picon
Gravatar

Weighting Schemes

Hi!
I am Praveen Kumar, an Applied Mathematics student  and I am interested in developing other weighting schemes for Xapian through GSOC.
I did not have any formal course in Information Retrieval in our institute. The theory that I presently know is
from the Xapian documentations and other references and resources mentioned on the website which I read
to design our own Probabilistic Information Retrieval framework while working on a project for our institute in which we have to
design a Crawler, Indexer and a Query Processor.
    The question that was put was that should we use Xapian to avoid re-inventing the wheel or write our own code to learn Information Retrieval Systems.
Our mentor ruled in the favor of writing our own code in python. The project is only 18 days old and I have finished writing the crawler and working on indexer.
     My happiness knew no bounds when I saw Xapian in the list of selected organisations for GSOC. I have contributed to Open Source only in
reporting bugs and writing documentation diffs. I wish to contribute to Xapian in a major way.
    Presently I am studying various weighting schemes that Xapian can be extended to. I hope the Xapian Community helps me in this journey.

Thank You
Praveen Kumar
Int. M.Sc. in Applied Mathematics
Indian Institute of Technology Roorkee, India

_______________________________________________
Xapian-devel mailing list
Xapian-devel <at> lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
Abdul Rauf | 20 Mar 2011 13:25
Picon

GSoC 2011 - Improve Existing Bindings

Dear Olly,


I am very excited to contribute in open source community through the platform of Google Summer of Code 2011. I have visited the ideas of “Xapian” at http://trac.xapian.org/wiki/GSoCProjectIdeas for GSoC 2011. I am interested in working on “Improve Existing Bindings”. The reason for my interest is that I have previously worked on .NET related projects both in VB and C#. I also have worked on C++ projects.


I am writing to let you know my understanding of the project and would like to have a feedback on it. My understanding of the requirements is as follows:

Would you please refer me some material to study in this regard? Also what kinds of binding improvements are required by Xapian? I look forward to your response.


Regards


Rauf
University of Gloucestershire

_______________________________________________
Xapian-devel mailing list
Xapian-devel <at> lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
James Aylett | 20 Mar 2011 15:28

Re: GSoC 2011 - Improve Existing Bindings

On 20 Mar 2011, at 12:25, Abdul Rauf wrote:

> Dear Olly,

I've added myself in as a potential mentor on the bindings projects, so I'll answer this in the first instance.

> I am very excited to contribute in open source community through the platform of Google Summer of Code
2011. I have visited the ideas of “Xapian” at http://trac.xapian.org/wiki/GSoCProjectIdeas for
GSoC 2011. I am interested in working on “Improve Existing Bindings”. The reason for my interest is
that I have previously worked on .NET related projects both in VB and C#. I also have worked on C++ projects.

Excellent.

> I am writing to let you know my understanding of the project and would like to have a feedback on it. My
understanding of the requirements is as follows:
> 
> 	• To study current bindings. In case of C#, I am studying this link (http://xapian.org/docs/bindings/csharp/)
> 	• To implement improvements in the current bindings.
> 
> Would you please refer me some material to study in this regard?

Beyond the link you already have, the main information about the C# bindings will be the SWIG code itself, in
xapian-bindings <http://svn.xapian.org/trunk/xapian-bindings/>. SWIG bindings split into two
parts: there's generic code shared across all the bindings (the *.i files in the xapian-bindings root),
and language-specific files (in a subdirectory; there are two main files, util.i and extra.i; generally
speaking util.i is for SWIG "typemaps", which convert arguments between the target language and C++,
while extra.i is currently only used by the python bindings to inject additional python code, mostly to
provide more idiomatic iterators).

> Also what kinds of binding improvements are required by Xapian? I look forward to your response.

The intention of bindings is to be as idiomatic in the target language as possible; so for instance we rename
methods and classes to match the language conventions where possible. (There's a lot of this done for C#
already, mostly automatically by SWIG, but it's obviously worth checking if there's anything missing
there.) More work, but considerably more rewarding, is ensuring that core language idioms such as
iteration are supported, rather than having to use C++ idioms. (It's been a long while since I played with
C#, so our bindings may already be in good shape here.)

As the project summary indicates, there's some default SWIG wrapping going on which can probably be
improved. The ultimate aim is that a C# programmer should be able to think in a C# way when using the bindings.

Beyond that, providing detailed examples (SWIG C# supports directors, meaning that user code written in
C# can be passed as callbacks to some parts of Xapian, so it would good to provide some examples of this),
suitable tests (you don't need to test Xapian itself in detail through the bindings, but you do need to test
anything specific to that language, and obviously some general functionality tests are also good), and
documentation are all important. In addition to the docs on the website / shipped with the code (which
you've found), it may be possible to automatically generate C# XML doc comments; we do something similar
in Python.

James

--

-- 
 James Aylett
 talktorex.co.uk - xapian.org - devfort.com - spacelog.org
Nikita Smetanin | 20 Mar 2011 15:53
Picon

GSoC 2011: Improve Spelling Correction

Hello, I am Nikita Smetanin (ntz), russian student. I'm interested in
fuzzy search algorithms (also known as similarity search and spelling
correction), I have some articles and open-source implementations of
related algorithms. I also have good experience in enterprise software
development (Java/C++/C# and related stuff) and in small projects.

I want to work on your project "Improve spelling correction", but I
want to suggest some additions to that project:

- One or several phonetic matching algorithms to improve name and
surname search.
- Alternative faster (than trigram) algorithm for correction candidate search.
- More complicated word distance metric to improve result set relevance.
- Something about improving stemming quality.
- Language detection for automatic language-specific algorithms selection.

I'll be happy to participate in this project during Google Summer of
Code 2011 program and implement most of these ideas.
Sumith Matharage | 21 Mar 2011 01:49
Picon

Re: GSOC 2011 : Weighting Schemes

Hi Olly,

At the moment no specific questions, since the project description itself gives all the necessary information very clearly. I just thought of having a look about different DFR schemes and BM25 scheme, to refresh and improve my knowledge about those. Since I have closely worked with TF-IDF model recently, thought of study more about DFR schemes, BM25 and  compare each other to understand their pros and cons. 

At the same time since I haven't used Xapian before, thought of get familiarize with that too. 

What is your input on that? Any suggestions to improve my knowledge in this project? 

Also, just thought of briefing you about myself. I graduated from University of Moratuwa, Sri Lanka, Faculty of Engineering specializing in the field of Computer Science and Engineering (http://www.cse.mrt.ac.lk/). I topped the engineering batch (batch of more than 500 students) with a 4.06 GPA (out  of 4.20). Also I was awarded the Gold Medal for the best Computer Science and Engineering student in 2007. After that I joined the industry as a software engineer and have more than 2 years of industry experience as a software engineer. In 2009, I was awarded a scholarship by Monash university to carry out my PhD studies and currently I am working in the field of Text Weighting and Text Mining techniques to optimize text clustering results.

Thank you very much for your input.

Cheers,
Sumith 

On Sat, Mar 19, 2011 at 9:41 PM, Olly Betts <olly <at> survex.com> wrote:
On Sat, Mar 19, 2011 at 08:24:21PM +1100, Sumith Matharage wrote:
> I'm Sumith, a postgraduate student in Monash university. I'm working in the
> area of Text weighting schemes and Text Mining. When I'm going through the
> GSOC project list, I felt interested in the 'Weighting Schemes' project. At
> the moment, I have worked with different weighting schemes as TF-IDF and
> would love to join and contribute with my ideas in this project.

OK, sounds like you're well qualified on the theory side there then.
Did you have any particular questions?

Cheers,
   Olly

_______________________________________________
Xapian-devel mailing list
Xapian-devel <at> lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
Maheshwar | 21 Mar 2011 10:37
Picon

GSOC 2011 - QueryParser Reimplementation

hello everyone,
I am Maheshwar, a prefinal year Computer Science undergraduate student at BITS-Pilani, India. When i was going through the GSOC ideas , i felt interested in Quesry parser project. Till now i have implemented a couple of LL(1) parsers as a part of my assignment in Compiler construction course, so  i would love to join and contribute to this project. So can any one tell me how to go about the project as i am new to xapian.


--
Regards,
Maheshwar
3rd Year B.E Computer Science
BITS-Pilani, Rajasthan

_______________________________________________
Xapian-devel mailing list
Xapian-devel <at> lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel

Gmane