I thought about this proposal and this is the current state:
The processing part (called "engine") should be seperated from the interface (website).
engine - this part processes specific changesets and put the resutl into a database
website - frontend to display stored data (dashboard), mark false positives/negatives
extensibility of the engine:
- plugins return a score (integer) which stored in the database
- different types of plugins:
changeset scope (i.e.: mass deletion/import, very far movement of nodes)
° multiple changeset scope (i.e.: many changesets within short time per user)
° user related score (i.e.: date of registration, number of edits, blocked user?)
° area related score - mark specific area as a suspicious one for some time (i.e.: vandalism of a area by several users)
- these scores are may summarized by type and then multiplied/weighted
- engine has to create "fake changesets" containing changes from several changesets being in relation (user, time window) to detect splitted changes
- each changeset has a total rating -> use a treshold value to divide them into suspicious and not suspicious
Some questions came up within this preparation:
- Is there a prefered language? Has this to be specified within the proposal? (language skill has to be rated, so I would decide this during the project phase)
- I also would like to discuss used libraries and framework within the project phase, or should I decide this also in my proposal?
- Should the frontend integrate in the current website (ruby on rails project) or should this just be an optional feature?
- How detailed should be the proposal? Is it enough to formulate this draft?
Point out my mistakes and feel free to ask questions, criticize this draft and share your ideas and thoughts. :)
Am 26. März 2012 12:14 schrieb kabum <uu.kabum <at> gmail.com>
me again. Derick answered my PM and I recognized, that I've missed some features.
The interface should be a simple website listing the suspicious changesets. As well a possibility to mark false positives and false negatives were great.
Derick suggested also a integration with JOSM and mentioned its changeset reverting capabilities.
Am 26. März 2012 00:36 schrieb kabum <uu.kabum <at> gmail.com>
Am 19. März 2012 22:45 schrieb Graham Jones <grahamjones139 <at> gmail.com>
Thank you for your interest in applying for GSoC with Openstreetmap. This list will be fine to ask questions.
Here are a few suggestions to get you started:
- It is important to understand the fundamentals of what OSM is, so if you have not done so before, please start by creating an account and making some improvements to the map in your local area.
I heard of OSM a long time ago, but was just to lazy to contribute to. So I tried these days and I was really surprised how fast changes are visible in the rendered map. I've taken several notes of my surrounding waiting for filled into the OSM database. :)
- It would also be good to look at the OSM data structure. Details of the xml file format can be found on our wiki.
- If you search for Nominatim on the OSM wiki you should find some information on the current service and links to the source code to see how it currently does searching to see how it could be improved.
The project idea was suggested by 'sabas88' - could he/she provide some more information on the issues behind this project suggestion please?
I've asked him and the only answer was a link to the GSoC project site in the OSM wiki. :(
I read a lot about OSM, it's mechanism, assistant tools, etc and also about Nominatim and I realized, that this isn't what I want to do. I've looking for some other contribution to OSM and GSoC and found the suggestion for an quality assurance tool specialized for edits/changesets (by Derick Rethans). There are many quality assurance tools but no one like this - or have I missed it?
The idea is to have a engine that gets a (set of) changesets or edits and analyse them. It should detect things like logical mistakes, mass deletions without corresponding insertions, etc and take also user metadata like duration of membership or count of his edits into account. It would be great if it compare the changes with current state of the data in this area and detect senseless checks, because the data is out of date and already corrected.
Some other things to keep in mind while planning:
- extensibility through "plugins": engine (calls)-> several detection plugins
- there could be searches for suspicious changesets/edits in specific area
This was just a quick outline of the proposal. Are there some suggestions, wishes, questions or doubts?
In the next days I plan to specify this proposal.
Hope that helps. Please feel free to ask more questions as you develop your proposal.