Connect to SQL SERVER in R


R is an OPEN SOURCE software environment for statistical computing and graphics. It is easy to use and create valuable data visualization in a minute. Learn more about R.

This blog post is to explain the steps to connect to SQL Server database in R and display simple graphics.

Over all steps

  1. Create ODBC Data Source
  2. Setup R environment to connect to SQL Server
  3. Query the SQL table, display data and graphics

Step 1: Create ODBC Data Source:

Goto, Control Panel -> Administrative Tools -> Data Sources (ODBC)

DSN1

Click Add button

dsn2

select SQL Server Native Client and click Finish

dsn3

Give data source name and SQL Server name to connect and Click Next

dsn4

Select an appropriate SQL Server authentication option and give valid credential. In this example, we are using Integrated Windows Authentication. Click next

dsn5

Select the database to work with and modify the application intent option to Read-only. These settings are for this exercise only. We may use default setting and click next.

dsn6

Click finish and TEST connection

dsn7

Step 2: Setup R environment to connect to SQL Server.  Please install R software.

Install RODBC package to R:

  • install.packages(“RODBC”)

Load the RODBC library to current environment to use:

  • library(RODBC)

Connect to SQL Server using ODBC connection:

  • wrpt <-odbcConnect(“weatherdata”)

We have just connected to SQL Server and the R connection variable name is wrpt. We can use this variable to query a table and more.

Step 3: Query the SQL table to display data and graphics

This statement loads the entire table “tblTemp” in to the variable weather.

weather <- sqlFetch(wrpt, “tblTemp”)  

This statement displays the entire table

print(weather)

This statement displays the summary of the data-set

summary(weather)

rdata

This statement displays a graph with daily max temperature for a  month

plot(weather$maxtemp, type=”l”)

rgraph

R manuals

Thanks for reading this post.

SQL SERVER – Declare Mulitple Variables in Single Line


This may be a simple thing, but sometime new developers are difficult to find this information readily. So, thought of adding it in this website.

Question
How to declare and initialize variables in single or multiple lines?

Answer:

Declaring variables – Multiple lines

DECLARE @int_i AS INT;
DECLARE @chr_Name AS VARCHAR(50);

Declare and Initialize variables – Multiple lines

DECLARE @int_i AS INT = 1;
DECLARE @chr_Name AS VARCHAR(50) = ‘Test’;

Declare variables in single line

DECLARE @int_i AS INT, @chr_Name AS VARCHAR(50);

Declare and Initialize variables in single line

DECLARE @int_i AS INT = 1, @chr_Name AS VARCHAR(50) = ‘Test’;

That is it, You have learned to declare and initialize variables in MS SQL Server.

Q&A – Removing special characters at the end of the string in SQL Server


Recently, I have received a question from a reader asking me how to remove special characters at the end of the string without using TSQL. This look very simple until I started thinking about the answer. Because, the requirement is not to create TSQL to complete this task. I did try different approach and finally got the a String Function in SQL Server that helped me solving this problem.

Sample source data is given below

Sophia123
Emma#1
Olivia$23
Aiden???
Jackson!%
Ethan_123%

These are the names that I choose for this experiment. Following is the step by step approach to solve this problem

Solution

Step 1: Create a Test table

Create table TestNames
(
Firstname  varchar(40)
)
GO

Step 2: Insert sample values into the test table

Insert into TestNames (Firstname) values('Sophia123')
Insert into TestNames (Firstname) values('Emma#1')
Insert into TestNames (Firstname) values('Olivia$23')
Insert into TestNames (Firstname) values('Aiden???')
Insert into TestNames (Firstname) values('Jackson!%')
Insert into TestNames (Firstname) values('Ethan_123%')
GO

Step 3: This is the final step to remove the special character and numbers at the end of the string

I used PATINDEX() string function to accomplish this task. I have also make use of REVERSE(), LEN() and SUBSTRING() functions to simplify the data cleaning process.

Step a: We need to Reverse the whole string

Step b: Find the first occurrence of the alphabets both lower and upper case using wild card characters.

Step c: Extract the SubString() starting from the position we got it in the step b to total length of the string.

Step d: Now, Reverse the sub string to get the desire output.

SQL Query

Select 
REVERSE(
    SUBSTRING(
        REVERSE(Firstname), 
        PATINDEX(
            '%[a-z][A-Z]%', 
            REVERSE(Firstname)
            ), 
        LEN(Firstname)
    )
) 
as 'Firstname'
from TestNames
GO

Output
Sophia
Emma
Olivia
Aiden
Jackson
Ethan

This is the method I used to solve this problem. If you have any other solution, Please drop it in the comment section.

GeoSpatial – GIS File Formats


GIS file formats are the standards that are used to store the Geo-spatial data. These data formats are created by government entities, educational institution, geological information storage and service organization and etc.

There are many popular open source or proprietary formats available. Following are the most popular format which we are using in day today geography data.

  • Raster
  • Vector
  • Grid
  • Others

Raster

Raster data type consists of rows and columns of cells and each cell storing a single value. It uses color code RBG.  An external attribute table can be maintained each cell by rows. Raster data can stored in various formats such file-based structure of TIF, JPEG and etc. We can also use RDBMS to store this data as BLOB.

Vector

GIS data in vector format contains typical geometry shapes such as points, line, polygon and etc. to represents geological location or regions in the earth. Usually, vector data may also comes with the database that describes the attributes of the shapes. For instance, if we use point on a building the attribute database may have its address, phone number and etc. to provide additional details about that location.

To understand better about raster and vector, it is ideal to have a look at the below given image.

raster-vector-gis-i4Vector GIS has more flexible and simplicity to integrate in to the application. It is more developer friendly and simple to storage and use.

An Esri grid is a raster GIS file format used to represent elevation and it is developed by Esri.

There are two formats available now:

  1. A proprietary binary format
  2. A non-proprietary ASCII format

There are different organization and their software uses different formats to store GIS data. Below given list is taken from Wikipedia.

Raster:

Vector:

Other GIS data storage formats

Vector format, WKT and WKB are the useful GIS data storage and manipulation formats in SQL Server. We will learn about that in the next blog post.

SQL Server – Definition and Calculation for Spatial Data


In this blog post. I am just giving the introduction to the Geo Spatial data representation in general and for SQL Server.

Measurements

In the planar or flat-earth system, measurements of distances and areas are given in the same unit of measurement as coordinates. Using the geometry data type, the distance between (2, 2) and (5, 6) is 5 units, regardless of the units used.

In the ellipsoidal or round-earth system, coordinates are given in degrees of latitude and longitude. However, lengths and areas are usually measured in meters and square meters, though the measurement may depend on the spatial reference identifier (SRID) of the geography instance. The most common unit of measurement for the geography data type is meters

Geometry is a very complex data type. So, OGC-SFS defines two alternative standard notations to represent Geometry values:

  • The WKT (Well Known Text) notation is intended to be user-friendly (not really so user-friendly after all, but at least human readable).
  • The WKB (Well Known Binary) notation on the other side is more intended for precise and accurate import/export/exchange of Geometries between different platforms.

Well-Known Text Representation of Spatial Reference Systems

The definitions of the well-known text representations are modeled after the POSC/EPSG coordinate data.

Well-Known Binary Representation for Geometry

The WKB representation of a Geometry value is a continuous stream of bytes. It is obtained by serializing a geometric object as a sequence of numeric types drawn from the set {Unsigned Integer, Double} and then serializing each numeric type as a sequence of bytes using one of two well-defined, standard, binary representations for numeric types (NDR, XDR).

GML in SQL Server

The Geography Markup Language (GML) is the XML grammar defined by the Open Geospatial Consortium (OGC) to express geographical features. The geometry data type provides a method that generates a geometry instance from GML, an XML representation of geometric objects. SQL Server supports a subset of GML.

Full Globe in SQL Server

In SQL Server 2012, FullGlobe is a special type of Polygon that covers the entire globe. FullGlobe has an area, but no borders or vertices.

Spatial Reference Identifiers (SRID)

Spatial Reference System Identifier (SRID) is a unique value that is used to defined and identify a spatial coordinate system. This value id the most important value for any GIS application.

It can be created and implementation by any spatial vendors. But, we have the option to refer to those that are created and maintaining by authorities. Following are the most used SRID system.

Other similar referencing systems are

Because, we have to use proper coordinate system in our GIS application in order get accurate result with spatial data processing. For instance, we cannot use two different referencing systems to do any GIS calculations. In SQL Server, if you do so then it will return a null value.