Friday, February 21, 2014

SSIS 2012 Script Task "Could not load file or assembly"

I ran across a bug with SQL2012 install and SSIS script tasks. I was working on a simple ftp check file exists and for some reason after adding the System.Net.dll reference the task would result in a failure every time. Hit the we and came across a workaround that actually work. Please check out link 

After finding the article, I eventually figured out I could get all the components installed by running some of the installers on the SQL2012 DVD

  • redist/VSTA/designtime/VSTA_DT30.msi 
  • redist/VSTA/runtime/x64/VSTA_RT30.msi
  • redist/VSTA/runtime/x86/VSTA_RT30.msi

Thursday, February 13, 2014

AWS RedShift/S3 Data Warehouse Solution


Well over the last week I have been asked to setup a Amazon Web Service Redshift datawarehouse solution. At first I was a little taken back since I didn't know much about it. But over the past couple of days I have been reading over the infrastructure and design principals. At the end of the day after jumping through a few hoops it comes down to a traditional start schema design in PostgreSql backend. Below I will outline some key resources to help get started with the project.


  • Creating an account with amazon, most of you already have one you just need to sign into the AWS Console. 
  • You need to setup a drop point in S3. This allows you to drop bulk operation files into the online store for batch operations over to RedShift.
  • You need to setup a RedShift cluster. Size and scale are key hear to cost. Make sure to start small and scale as your data set grows.
  • You need to setup external ip access. By default you will not be able to connect via endpoint, jdbc, odbc connections
  • You need to setup local dev connectivity. I used Sql Workbench/J via jdbc drivers. 
  • Now that you are setup and connected you can start with creating a schema. Database, users, tables and stored procedures. One thing to keep in mind with developing your schema is you are working with a limited list of PostgreSql data types. 
  • How to get mass amounts of data into your new cloud based hosted solution. This is were it gets a little tricky. Typically, a quick ETL tool like SSIS or TalenD would do the trick. But these tools send an insert as "Insert into". Ideally RedShift is looking for a "Insert". The difference is 10 rows per sec vs 100K per sec. So Amazon recommends S3 to RedShift "COPY" command. I would associate this command like a tsql bulk copy command in MSSQL. 
  • Programmatic solution to loading data into S3
    • Found the AWS SDK and they have examples of how to do this. The key is to make sure the permissions are setup correctly and the region is bound correctly in the app config.
  • Using Data Pipeline to automate data from your drop point S3 to RedShift
    • more to come
  • Staging Updates and Deletes
    • more to come


Please check back as I develop the solution I will provide additional updates. 


Monday, January 6, 2014

invoke-expression for sqlcmd from powershell

Having some issues running a sqlcmd via powershell on a sql instance with custom ports. Try the following command.

Does not work:
sqlcmd -S USHXXDBWD002\I0007DS01,4070 -E -Q "select getdate()"

Does work:
$query = "select getdate()"
$cmd = "SQLCMD -S `"USHGNDBWD002\I0007DS01,4070`" -h-1 -Q `"$query`""
write-host "command is: $cmd"
invoke-expression $cmd

So, when calling SQLCMD directly, quotes were not necessary around the ; however, when invoking SQLCMD from PowerShell using Invoke-Expresion, the multi-part identifier needed quotes.


Thursday, June 20, 2013

PL/SQL Purge Recycle Bin


Recently came across several objects that needed to be purged from our Oracle DW. It seems after a drop table the object still remains in the RecycleBin.

Example to query for the objects:

select 'purge table "' || object_name || '" ;' , original_name, type, can_undrop as "UND", can_purge as "PUR", droptime   from recyclebin

Example of purge:
purge table "BIN$3EgaHwhNbdPgQy8CKArrew==$0" ;



Using Oracle's recycle bin

Monday, June 10, 2013

SSAS MDX Query for Current Year, Current Month and Current Day

Recently I had a need to retrieve current year via an MDX query. Below are some very useful VBAMDX functions that will help with looking at current periods. 

MDX for retrieving current Year;
WITH MEMBER [Measures].[Current Year] AS
    VBAMDX.Format(VBAMDX.Now(),"yyyy")
SELECT
  {[Measures].[Current Year]} ON COLUMNS
FROM [Adventure Works];

MDX for retrieving current Month;


WITH MEMBER [Measures].[Current Month] AS
    VBAMDX.Format(VBAMDX.Now(),"MM")
SELECT
  {[Measures].[Current Month]} ON COLUMNS
FROM [Adventure Works];


MDX for retrieving current Date;

WITH MEMBER [Measures].[Current Date] AS
    VBAMDX.Format(VBAMDX.Now(),"dd")
SELECT
  {[Measures].[Current Date]} ON COLUMNS
FROM [Adventure Works];

Sometimes user wants to retrieve data for current year, current month, and current day. You can retrieve data by using VBA function and STRTOMEMBER function.

Check following MDX samples by using Adventure Works sample.

MDX for retrieving current Year;

WITH
  MEMBER [Measures].[Current Year Value] AS
    (
      StrToMember
      (
            "[Date].[Calendar].[Calendar Year].&["
          + VBAMDX.Format(VBAMDX.Now(),"yyyy") + "]"
      )
     ,[Measures].[Internet Sales Amount]
    )
SELECT
  [Measures].[Current Year Value] ON COLUMNS
FROM [Adventure Works];


MDX for retrieving current Month;

WITH
  MEMBER [Measures].[Current Month Value] AS
    (
      StrToMember
        (
            "[Date].[Calendar].[Month].&[" +
             VBAMDX.Format(VBAMDX.Now(),"yyyy") + "]
            &[" + VBAMDX.Format(VBAMDX.Now(),"MM") + "]"
        )
     ,[Measures].[Internet Sales Amount]
    )
SELECT
  [Measures].[Current Month Value] ON COLUMNS
FROM [Adventure Works];

MDX for retrieving current Day;

WITH
  MEMBER [Measures].[Today's Value] AS
    (
      StrToMember
      (
           "[Date].[Calendar].[Date].&[" +
           VBAMDX.Format(VBAMDX.Now(),"yyyyMMdd")
           + "]"
      )
     ,[Measures].[Internet Sales Amount]
    )
SELECT
  [Measures].[Today's Value] ON COLUMNS
FROM [Adventure Works];

Thursday, May 23, 2013

DimDate Dimension

If you are looking for a DimDate script for your EDW, this one may work for you.


IF OBJECT_ID('DimDate') IS NOT NULL 
    DROP TABLE DimDate
GO


CREATE TABLE [dbo].[DimDate]
    (
      [DateKey] [int] NOT NULL ,
      [CalendarDate] [smalldatetime] NULL ,
      [CalendarWeek] [varchar](8) NULL ,
      CalendarWeekNumber INT ,
      [CalendarMonth] [varchar](8) NULL ,
      [CalendarMonthName] [varchar](30) NULL ,
      CalendarMonthNumber INT ,
      [CalendarQuarter] [varchar](8) NULL ,
      [CalendarYear] [varchar](4) NULL ,
      [MonthStart] [smalldatetime] NULL ,
      [MonthEnd] [smalldatetime] NULL ,
      WeekDay INT ,
      WeekDayName VARCHAR(20) ,
      WeekDayType VARCHAR(20)
        CONSTRAINT [PK_DimDate] PRIMARY KEY CLUSTERED ( [DateKey] ASC )
    )
ON  [PRIMARY]
GO
CREATE UNIQUE INDEX IX_DimDate ON DimDate(CalendarDate)
GO


--======================================
-- Populate the dimension
--======================================
;
WITH    DateCTE
          AS ( SELECT   CAST('1995-01-01' AS DATE) AS DateValue
               UNION ALL
               SELECT   DATEADD(d, 1, DateValue)
               FROM     DateCTE
               WHERE    DATEADD(d, 1, DateValue) < '2015-01-01'
             )
    INSERT  INTO DimDate
            SELECT  CAST(CONVERT(CHAR(8), CAST(DateValue AS DATETIME), 112) AS INT) AS DateKey ,
                    CAST(DateValue AS SMALLDATETIME) AS CalendarDate ,
                    CAST(YEAR(DateValue) AS CHAR(4)) + '-'
                    + CAST(DATEPART(wk, DateValue) AS VARCHAR(2)) AS CalendarWeek ,
                    DATEPART(wk, DateValue) AS CalendarWeekNumber ,
                    CAST(YEAR(DateValue) AS CHAR(4)) + '-'
                    + CASE WHEN DATEPART(m, DateValue) < 10
                           THEN '0' + CAST(DATEPART(m, DateValue) AS CHAR(4))
                           ELSE CAST(DATEPART(m, DateValue) AS CHAR(4))
                      END AS CalendarMonth ,
                    DATENAME(MONTH, DateValue) AS CalendarMonthName ,
                    DATEPART(m, DateValue) AS CalendarMonthNumber ,
                    CAST(YEAR(DateValue) AS CHAR(4))
                    + CAST(DATEPART(Quarter, DateValue) AS CHAR(1)) AS CalendarQuarter ,
                    YEAR(DateValue) AS CalendarYear ,
                    CAST(CONVERT(VARCHAR(10), ( CONVERT(CHAR(6), CAST(DateValue AS DATETIME), 112)
                                                + '01' ), 120) AS SMALLDATETIME) AS CalendarMonthStart ,
                    CAST(CONVERT(VARCHAR(10), CAST(CONVERT(CHAR(8), CAST(CONVERT(VARCHAR(12), DATEADD(day,
                                                              -1
                                                              * DAY(DATEADD(month,
                                                              1, DateValue)),
                                                              DATEADD(month, 1,
                                                              DateValue)), 113) AS DATETIME), 112) AS INT), 120) AS SMALLDATETIME) AS CalendarMonthEnd ,
                    DATEPART(dw, DateValue) AS WeekDay ,
                    DATENAME(dw, DateValue) AS WeekDayName ,
                    CASE WHEN DATENAME(dw, DateValue) IN ( 'Saturday',
                                                           'Sunday' )
                         THEN 'WeekEnd'
                         ELSE 'WeekDay'
                    END AS WeekDayType
            FROM    DateCTE a
            ORDER BY DateKey
    OPTION  ( MAXRECURSION 32000 )
GO


SELECT * FROM  dbo.DimDate 


Wednesday, May 22, 2013

Value of a Partitioned Table

Recently I came across some larger tables in our production CRM database and decided to do some testing on the value of adding a partition to an existing larger MS SQL table (177 million rows). At the end of the day as with everything in IS, it depends on how the application interacts with the database/table. In my example, I decided to use a yearly partition schema. This seems to be most common approach with larger tables. The below script will walk you through and directly compare the same query on a non-partitioned clustered index to a partitioned cluster index. If you review the pre/post execution plans you will notice that Execution I/O Cost has been greatly reduced with the use of a partition. I approximate around 98% reduction in IO cost. With additional testing on the production table, I feel the reduction in Execution I/O cost from a partition schema is something everyone should look into on larger tables.



-- check for table
IF OBJECT_ID('dbo.TestPartitionTable') IS NOT NULL 
    BEGIN
        DROP TABLE dbo.TestPartitionTable
    END
go

-- create table
CREATE TABLE [dbo].[TestPartitionTable]
    (
      [pkcol] [int] NOT NULL ,
      [datacol1] [int] NULL ,
      [datacol2] [int] NULL ,
      [datacol3] [varchar](50) NULL ,
      [partitioncol] DATETIME
    )
GO

-- Populate table 
DECLARE @val INT
SELECT  @val = 1
WHILE @val < 1000001  -- million rows
    BEGIN  
        INSERT  INTO dbo.TestPartitionTable
                ( pkcol ,
                  datacol1 ,
                  datacol2 ,
                  datacol3 ,
                  partitioncol
                )
        VALUES  ( @val ,
                  @val ,
                  @val ,
                  'TEST' ,
                  GETDATE() - ( @val * .01 )
                )
        SELECT  @val = @val + 1
        PRINT @val
    END
GO

/*
-- recovery if needed
INSERT INTO dbo.TestPartitionTable
        ( pkcol ,
          datacol1 ,
          datacol2 ,
          datacol3 ,
          partitioncol
        )
SELECT * FROM dbo.TestPartitionTableBackup
*/

-- added standard PK
ALTER TABLE dbo.TestPartitionTable ADD CONSTRAINT PK_TestPartitionTable PRIMARY KEY CLUSTERED (pkcol) 
GO
CREATE NONCLUSTERED INDEX IX_TestPartitionTable_col2col3 ON dbo.TestPartitionTable (datacol1,datacol2)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, 
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) 
ON [PRIMARY]
GO

-- check plan performance prior to partition

 -- Estimated I/O Cost
 -- The estimated cost of all I/O activity for the operation. This value should be as low as possible
 -- 4.50542
SELECT  *
FROM    TestPartitionTable
WHERE   partitioncol >= '20130101'


-- check for 
DROP PARTITION SCHEME myPartitionScheme 
go
 DROP PARTITION FUNCTION pfDateRange
 go

-- create date range partition
CREATE PARTITION FUNCTION pfDateRange (DATETIME)
AS RANGE RIGHT FOR VALUES ('20110101', '20120101','20130101')
GO
CREATE PARTITION SCHEME myPartitionScheme 
AS PARTITION pfDateRange ALL TO ([PRIMARY]) 
GO

-- check the partition schema and range
SELECT  ps.name ,
        pf.name ,
        boundary_id ,
        value
FROM    sys.partition_schemes ps
        INNER JOIN sys.partition_functions pf ON pf.function_id = ps.function_id
        INNER JOIN sys.partition_range_values prf ON pf.function_id = prf.function_id

-- drop original cluster index
ALTER TABLE dbo.TestPartitionTable DROP CONSTRAINT PK_TestPartitionTable
GO
-- add noncluster index
ALTER TABLE dbo.TestPartitionTable ADD CONSTRAINT PK_TestPartitionTable PRIMARY KEY NONCLUSTERED  (pkcol)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, 
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
-- add partition index
CREATE CLUSTERED INDEX IX_TestPartitionTable_partitioncol ON dbo.TestPartitionTable (partitioncol)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, 
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) 
ON myPartitionScheme(partitioncol)
GO

-- check parition size
SELECT  t.name AS TableName ,
        i.name AS IndexName ,
        p.partition_number ,
        p.partition_id ,
        i.data_space_id ,
        f.function_id ,
        f.type_desc ,
        r.boundary_id ,
        r.value AS BoundaryValue
FROM    sys.tables AS t
        JOIN sys.indexes AS i ON t.object_id = i.object_id
        JOIN sys.partitions AS p ON i.object_id = p.object_id
                                    AND i.index_id = p.index_id
        JOIN sys.partition_schemes AS s ON i.data_space_id = s.data_space_id
        JOIN sys.partition_functions AS f ON s.function_id = f.function_id
        LEFT JOIN sys.partition_range_values AS r ON f.function_id = r.function_id
                                                     AND r.boundary_id = p.partition_number
WHERE   t.name = 'TestPartitionTable'
        AND i.type <= 1
ORDER BY p.partition_number;

-- parition column info
SELECT  t.object_id AS Object_ID ,
        t.name AS TableName ,
        ic.column_id AS PartitioningColumnID ,
        c.name AS PartitioningColumnName
FROM    sys.tables AS t
        JOIN sys.indexes AS i ON t.object_id = i.object_id
        JOIN sys.columns AS c ON t.object_id = c.object_id
        JOIN sys.partition_schemes AS ps ON ps.data_space_id = i.data_space_id
        JOIN sys.index_columns AS ic ON ic.object_id = i.object_id
                                        AND ic.index_id = i.index_id
                                        AND ic.partition_ordinal > 0
WHERE   t.name = 'TestPartitionTable'
        AND i.type <= 1
        AND c.column_id = 1;


SELECT  OBJECT_SCHEMA_NAME(i.object_id) AS [schema] ,
        OBJECT_NAME(i.object_id) AS [object] ,
        i.name AS [index] ,
        s.name AS [partition_scheme]
FROM    sys.indexes i
        JOIN sys.partition_schemes s ON i.data_space_id = s.data_space_id

 -- Estimated I/O Cost
 -- The estimated cost of all I/O activity for the operation. This value should be as low as possible
 -- from 4.50542 to 0.0717498
SELECT  *
FROM    TestPartitionTable
WHERE   partitioncol >= '20130101'

-- reset if needed
--SELECT * INTO TestPartitionTableBackup FROM TestPartitionTable

Reference link