martes, 7 de marzo de 2017

Selection uniques linestrings features from two datasources

TOOLS:

ogr2ogr -> www.gdal.org/ogr2ogr.html
gdal_rasterize ->  www.gdal.org/gdal_rasterize.html
Openjump -> www.openjump.org
QGIS -> www.qgis.org
tiffvalue.py -> https://github.com/juanmafont/GisUtilities

Requisites:
 You must have your data sources in a Cartesian EPSG, example EPSG: 3857

The problem:

  You have two data sources, both are linestring, with many similar but not identical elements. How to select those elements that are uniques?

So you have two sets of rivers, or ditches, or ways, or whatever elements thats are represents using linestring, like this, where source A is Red and Source B is Blue and you want to select only those unique elements, eliminating those that are very similar, often even almost overlapping, but they are not exactly the same, you also want to select those segments of an element that are unique.

 ummmm,  It is hard but not impossible, let's see how this can be achieved.

 Detail of the upper yellow area. Left is original (overlapped and similar lines from two datasources), right is the final result (only uniques lines from two original datasources)

In the detailed image you can check that the elements are similar, they are sometimes superimposed, but they are not the same. The detailed image on the right shows in yellow, those elements or segments of elements that are unique.


 Step 1: 

    Choose which data source will be the primary and which will be the secondary. We choose  source-A.shp (Red) has primary and source-B.shp ( Blue ) has secondary.  So the sourceA.shp  (Red) will remain unchanged and we will work over sourceB.shp

 Step 2:  
   
    Ensure source-B.shp is linestring (not multi-linestring)

      ogr2ogr -f "ESRI Shapefile" -nlt LINESTRING -explodecollections -lco ENCODING=UTF-8 sourceB-linestring.shp sourceB.shp

  Note: The trick into above command is "-explodecollections"

Step 3:

   Split sourceB-linestring.shp  into pieces where it crosses with sourceA.shp

  To do this, use the technique described here ( which uses the OpenJump program and the Node tool ) : http://gis.stackexchange.com/questions/98087/split-lines-at-intersection-of-other-lines

    Summary of this technique:
  •  Open two layer into OpenJUMP.
  • Combine selected layers (Layer->Combine Selected Layers). Note: now you can remove originals layers to save memory into openJump.
  • Use Noder tool  (Tools - Edit Geometry - Noder with defaults options) (Can Now you can remove Combined layer to save memory into OpenJUMP)
  • Put over new created layer "Combined Noded" and select Menu->Edit->Extract Layer by Atributted->(Select "Layer" as attribute)
  • You will get original layers but splitted, save new created  sourceB as sourceB-splitted.shp
 Note: If you have problems with memory heap from java into above steps, you can modify JAVA_MAXMEM="-Xmx512M"  into oj_linux.sh / oj_windows.bat file

 So you have now a new splitted and linestring sourceB.shp, so we rename it sourceB-splitted.shp file.

Step 4: 

   We gonna to make a MASK  from sourceA.shp. Firts, open it with OpenJump ( Openjump is very faster than QGIS for many tasks) and choose Tools->Analisis->Buffer

Uses a suitable size for the buffer, for example, a value of 40 meters can be a good choice. So good values are:

    Fixed distance: 40
    Number of segments: 2
    Preserve attributes: Disabled

 Save this new generated layer has buffer-40m-sourceA.shp
  
Step 5:

  Convert new buffer-40m-sourceA.shp to black and white geotiff file,  this file will be our MASK, you can do this from QGIS (with a little trick ) or from command line line with gdal_rasterize..

   NOTE: I am working over Andalusia extension (https://en.wikipedia.org/wiki/Andalusia) so I will use a resolution of 25 meters per pixel, this will result in a geotiff file of approximately 380MB, so choose a resolution according to your needs. Make sure the linestrings are well covered by our buffer/mask but not too much wide around the elements. Look at the example  image to get an idea.


This step can be executed in two ways, from QGIS or from the command line. Here, I will explain both.

 Way 1: With QGIS:  
        Remember, we have now 3 layers:
  • sourceA.shp
  • sourceB-splitted.shp
  • buffer-40m-sourceA.shp
    Do ->Raster->Conversion->Rasterize ( Vector to raster ). Choose as input file buffer-40m-sourceA.shp (IMPORTANT, very easy select another) , choose output file "mask.tiff" into our project directory (QGIS will say file not exists so we need setup output size or resolution) so click ok and select "raster resolution in...." and fill both, horizontal and vertical with same value, we choose 25 (meters per pixel). But do not press the OK button yet.



        We must manually edit the command before pressing the OK button, adding two new parameters. Look at the pencil icon on the bottom right, click it and add these two parameters: 

      -ot BYTE -burn 255

     So the final command to run will look something like this:


     Now you can press the OK button, when finished press OK->OK->CLOSE and you got your new mask layer, move the new mask layer  below sourceA.shp and sourceB-splitted.shp, but upper buffer-40m-sourceA.shp.

  Our new B/W mask layer it is showed all BLACK, so we need correct it.

Select mask layer, right click, choose-> Properties->Style-> and set Max value to 255->Press OK

Now you can see your new mask layer, check it cover all features into layer-A. Do zoom over it and explore.


Way 2. From command line:

Go to project directory and execute:

gdal_rasterize -ot BYTE -burn 255 -tr 25.0 25.0 buffer-40m-sourceA.shp mask.tiff

                    ( that's all, this is the power of console ¡¡¡ )    Now, you can load this raw layer into QGIS ->layer->Add Layer->Add Raster Layer, see above to show it as B/W colors.
                             
STEP 6:

    Now we have to run one last command before getting our final selection of unique items. That is the crux of the matter. We gonna use a python script written by me,  called 'tiffvalue.py'.

  The 'tiffvalue' python script will calculate what percentage of the length of each feature is over the black zone of our mask and save into a field called "tiffvalue".  This is perfect to do a later selection.

  Download 'tiffvalue.py' and 'MaskHeatmap.py' from  https://github.com/juanmafont/GisUtilities make both executable (Note, python need you create an empty file called '__init__.py' into same directory where you save this two scripts ). 

So suppose you have the script in the directory "/home/username/QGIS/scripts/"

Open a console, or terminal, and launch the command:

 ~/QGIS/scripts/tiffvalue.py -hm mask.tiff sourceB-splitted.shp


STEP 7: (The last step) ;) 

   The layer sourceB-splitted.shp has been modified, so you must close it and reload again into QGIS.

   So you have now a sourceB-splitted.shp layer with two new attributes (fields) called 'tiffvalue' and 'length',  the attribute 'tiffvalue' is the percent of  feature's length that is over the black zone of mask.  We will do a selection using this attribute.  

  Into QGIS be sure you have select the new modified layer layerB open "Select features using an expression"





 And write into panel "Expression":

     "tiffvalue" >= 60

(Note, this way we gonna select features with more than 60% of his length over the black zone of our Mask.tiff)

 and press button 'Select',  that is, now right clik over our layer "sourceB-splitted.shp" and select 'Save as'  AND BE SURE SELECT "Save only selected features"

give a new name to the layer, like uniques.shp and press OK button and look at the result, I choose yellow to the new uniques.shp layer.



  NOTE: If you want, you would redoit selection trying other 'tiffvalue' like 40, 50, etc... as percent to get the perfect selection, you could even make successive selections by saving and deleting the previous seleted items. (The example showed has been do it with only one simple selection)

 To avoid very short features be selected, you can add a second condition like:

  "tiffvalue" >= 60 AND "length" >= 100

 Now you have the uniques features into  an untouch and original "layerA.shp" and a new "uniques.shp" layer.

   Good luck ¡¡¡   ;)

Addedum:

In very rare cases, when the elements are very straight and their ends start from other lines (or very close to them), they may not be selected, because if both end nodes are in the white zone of the mask, and there are no intermediate nodes ( Is straight), the script considers that the entire element is over the white area of the mask. To avoid this it is advisable to make a first selection of elements, save them and then delete them, then with the remaining elements, and before running the python script again  "tiffvalue.py", and make a second selection, execute the following command over the file which contains the layer with the rest of elements not chosen:

    ogr2ogr -segmentize 100 -f "ESRI Shapefile"  output.shp input.shp

 This command will "segmentize" the stringlines features into nodes every 100m (select a more suitable value according to your needs)

PD. I'm looking for a job  ;)

 

No hay comentarios:

Publicar un comentario