Data Formats_in_GIS

THIS PAGE IS STILL IN EDITING MODE

Geographical Information Systems contain in general two types of data: Vector and raster data.

Vector Data[edit]

Vector data is the type of geographical data where a dataset is stored in the form of collection of points, polyline, or polygons, and has attributes assigned to those features. Individual points are recorded as discrete geometric locations known as coordinate pairs/vertices (x, y values). These coordinates define the shape of the spatial object. The placement and organization of the points determine the type of vector data i.e., polyline, point or polygon (Goodchild, 2011). Points: When a feature’s geometry is represented by a single vertex, it is called point vector. Each point is defined by a single x, y coordinate. There can be many points in a vector point file. Examples of point data include sampling locations, the location of individual trees, or the location of light posts.

Vector Points. Source: Documentation QGIS.

Polylines: When the geometry consists of two or more-point vertices, and the first and last vertex are not the same, it is called polyline feature. The points are connected. For instance, a road, stream, or a footpath may be represented by a line. This line is composed of a series of segments, each “bend” in the road or stream represents a vertex that has defined x, y location.

A Vector Line. Source: Documentation QGIS.

Polygons: When three or more vertices are involved and first and last vertex is the same, then the vertices form a polygon. The boundary of a school, sports field, lake, and states or countries are often represented by polygons.

A Vector Polygon. Source: Documentation QGIS.

Vector data as such has a high precision, as the localization of vectors can be accurate, if they were measured and georeferenced in a detailed way. Hence, vector data is as precise as the way it was digitalized and measured (Abubahia and Cocea, 2015). For instance, much cadaster data is measured in the field and then implemented through GPS measurements. An alternative way, also to implement historical data, is the scanning and digitalization of paper maps or other sources, which is a quite work-intense process.

Vector data in layers

Most GIS applications group the vector data features into layers. Features in the layer have the same attributes and geometry type. For example, if we are recording all the buildings in our university, the will usually be stored together ad shown as a single layer on GIS, making it convenient when to deal with all the features.

Editing vector data

You can build and alter the geometry data in a layer using the GIS application, a process known as digitizing. The GIS application will only let you build new polygons in layers that already include polygons (such as agricultural dams). Similarly, if you want to change the shape of a feature, the application will only allow you to do it if the changed shape is correct. For instance, it won't let you change a line such that it has just one vertex. As we mentioned previously when discussing lines, all lines must have at least two vertices. GIS's ability to create and manipulate vector data is crucial because it is one of the primary methods for generating personalized data for topics that interest you. Imagine that you are keeping an eye on river contamination, for instance. The GIS could be used to digitize every storm water drain outfalls (as point features). Another option is to digitize the river itself (as a polyline feature). Lastly, you may digitize the locations where you took pH readings along the river's path (as a point layer).

Advantages of vector data:

  • The creator can add all the important information about the dataset in the geometry.
  • The geometry structures hold information in themselves – for example, why choose point over polygon?
  • The geometry can carry multiple attributes instead of one, e.g., a database of country can have attributes for name, states, population, GDP etc.
  • Data storage can be very simple and efficient compared to rasters.

Disadvantages of vector data:

  • Potential loss of detail compared to raster
  • Potential bias in datasets – what did the creator miss?
  • Calculations involving multiple vector layers needs math on the geometry and the attributes, so can be slower.

Vector datasets have a wide diversity of use, such as various industries, geospatial fields. For instance, computer graphics are largely vector-based, although the data structures in use tend to join points using arcs and complex curves rather than straight lines. Computer-aided design (CAD) is also vector- based. The difference is that geospatial datasets are accompanied by information tying their features to real-world locations.

Raster Data[edit]

Any pixelated (or gridded) data that has a defined geographic location assigned to each pixel is referred to as raster data. A pixel's value can be categorical/thematic (e.g., land use or forest cover) or continuous (for instance, elevation, temperatures) (Escobar et al., 2008). Raster datasets are abundantly used, examples are aerial photographs, satellite imagery, digital maps and images (Guo et al., 2016). Any given digital image is basically a raster image, since it consists of pixels. The only way a geographic raster differs from a digital image is if each location is linked with spatial information. The extent, cell size, number of rows and columns, and spatial reference system of the raster are all included as digital information.

Figure 1: Satellite (Landsat 8) imagery as a Raster example.

Raster data format can be used to represent phenomena closely resembling to real-world such as thematic data (also known as discrete) representing land-use data. Continuous data represents spectral dataset such as satellite imagery along with phenomena such as temperature or elevation. Although they can be displayed as data layers on your map alongside other geographic information, thematic and continuous rasters are frequently utilized as the primary material for spatial analysis with the GIS software (ENVI, QGIS, ArcGIS, etc.) Spatial Analyst extension. Picture rasters are frequently utilized as attributes in the ‘Attribute Table’ since they can convey additional map characteristics and can be displayed with geographic data.

Examples of continuous rasters are:

1. Temperature map

2. Precipitation maps

3. Elevation maps

Examples of categorical/thematic maps are:

1. Low, medium, and high temperature maps.

2. Land cover and utility maps

Attributes of Raster data

1. Spatial Resolution

The cell size in the raster dataset and the proportion of screen pixels to image pixels at the map scale are the aspects of raster spatial resolution. One screen pixel, for instance, can be produced by down sampling nine picture pixels into one, or a raster resolution of 1:9. In this instance, the image is less clear and lacking details since each screen must display 9 raster cells. However, a spatial resolution of 1:1 means that each pixel is showing only 1 raster cell. Even after further zooming in, the details on the pixel remain identical.

Figure 2: Raster image showing cell or pixel. Image Source: ESRI Images.

2. Spatial Extent

The spatial extent is the allowable range for the X and Y coordinates or m and z values. In simple words, it the geographic area which the raster covers, representing the edge or location that is furthest north, south, east or west.

Figure 3: Representing extent in a raster image. Image Source: National Ecological Observatory Network.
Figure 4: Raster image over the same extent, but at 4 different resolutions. Image Source: National Ecological Observatory Network.

“A raster at the same extent with more pixels will have a higher resolution (it looks more "crisp"). A raster that is stretched over the same extent with fewer pixels will look more blurry and will be of lower resolution.” Source: National Ecological Observatory Network (NEON) Despite such simple structure, Raster has a wide range of usability. Within GIS, raster usage can be categorized under four main parts:

1. Basemaps

2. Thematic maps

3. Surface maps

4. Attributes of a feature

Rasters as Basemaps

Raster data is frequently used in a GIS as a background display for additional feature layers. For instance, the display of orthophotographs beneath other layers gives the map viewer or user supplementary information while also giving them reassurance that the layers are geographically aligned and reflect actual objects. Pictures from aerial photography, satellite imagery, and scanned maps are the primary sources of raster basemaps.

Figure 5: A raster basemap showing buildings and streets of Leuphana University.

Raster as Thematic map

Thematic raster data is produced after analyzing other datasets. A common application of thematic data preparation is by classifying the satellite images into land use and land cover classes (Fig. B). Another example is results from combination of data from different sources such as vector, elevation models, and raster. For example one can use a geoprocessing model to create a raster data which shows map suitability for bike riders. A common type of thematic map is the choropleth map. In a choropleth map statistical data is aggregated into known geographical units. The first choropleth map was published by Baron Dupin in 1826, depicting educational levels in France by department. Recently choropleth maps came into use during the COVID19 pandemic showing number of COVID cases per country or district.

Figure 6: Thematic map showing the natural area around Elbe river.

Rasters as Surface Maps

Rasters are ideal for representing data which evolves in real time across a territory (surface). They offer a practical way to represent the continuity of a surface. The most typical use of surface maps is for representing the elevation of various locations of planet’s surface, but datasets such as temperature, rainfall, sea-surface temperature (SST), and population density etc., can also be displayed using surface maps. Elevation map for the Himalayas is shown in the raster below using green cells to represent lower elevation with yellow showing a gradient towards higher elevation.

Figure 7: Surface map of altitudes in Himalaya.

Attributes as a feature

Digital pictures scanned documents or drawings, with relation to geographic location forms the attributes of a feature in raster imagery. An attribute field type of raster stores the information alongside or within the geodatabase. E.g., we could take a photograph of the gray wolves in Yellowstone National Park as a keystone species.

General Characteristics of Raster data

Every cell (pixel) in raster has a value associated with it which represents the event portrayed by the dataset, such as height, spectral value, magnitude, or category. Pixel values can be integers, floating point or negative, positive. Integer values best represent the categorical data and floating points represents the continuous surfaces. No Data represents the missing data. Height could represent the altitude, or distance from men sea level. Spectral values are the values representing the light reflectance and color from the aerial photography and satellite imagery. Magnitude might signify the values of temperature, precipitation etc. Examples are a forest, river, agriculture farm, buildings in case of land-use and land-cover raster. 

References[edit]

1. Abubahia, A., Cocea, M. (2015). A Clustering Approach for Protecting GIS Vector Data. In: Zdravkovic, J., Kirikova, M., Johannesson, P. (eds) Advanced Information Systems Engineering. CAiSE 2015. Lecture Notes in Computer Science, vol 9097. Springer, Cham. https://doi.org/10.1007/978-3-319-19069-3_9

2. Goodchild, M. F. (2011). Scale in GIS: An overview. Geomorphology, 130(1-2), 5-9. https://doi.org/10.1016/j.geomorph.2010.10.004

3. https://docs.qgis.org/3.28/en/docs/

4. Escobar, F., Hunter, G., Bishop, I., & Zerger, A. (2008). Introduction to GIS. Department of Geomatics, The University of Melbourne, Available online at: http://www. sli. unimelb. edu. au/gisweb/(Accessed 02 April 2008).

5. Guo, N., Xiong, W., Wu, Q., & Jing, N. (2016). An efficient tile-pyramids building method for fast visualization of massive geospatial raster datasets. Advances in Electrical and Computer Engineering, 16(4), 3-9.

6. Nagy, Chelsea & Balch, Jennifer & Bissell, Erin & Cattau, Megan & Glenn, Nancy & Halpern, Benjamin & Ilangakoon, Nayani & Johnson, Brian & Joseph, Maxwell & Marconi, Sergio & O’Riordan, Catherine & Sanovia, James & Swetnam, Tyson & Travis, William & Wasser, Leah & Woolner, Elizabeth & Zarnetske, Phoebe & Abdulrahim, Mujahid & Adler, John & Zhu, Kai. (2021). Harnessing the NEON data revolution to advance open environmental science with diverse and data-capable community. Ecosphere. 12. 10.1002/ecs2.3833.

Tutorial about adding various datasets to QGIS[edit]

How to load a dataset[edit]

There are several different ways to open a new project in QGIS. In the following, different ways that are suitable for different purposes will be shown. The easiest way is by using the data source manager: If you click the icon on the top left a window will open. The data source manager allows you to load different types of data depending on the data type. We will first try to load vector data. To do so click on Vector.

Vector data source.jpg

First you must choose the Source Type File. Then click on the three dots next to the Vector Dataset(s) field. A dialogue field will open where you can choose the file you want to use. To load the data click Add. Note: With this way of loading data into QGIS you can only load a specific type of data, in this case vector data. If you want to load different types of data, you must choose other ways which will be explained in further tutorials.

Vector source.jpg

To load a Raster Data, instead of Vector Data in the first picture, we click on Raster Data and click on add.