fsugeiger: 2013

Thursday, June 20, 2013

PL/SQL Purge Recycle Bin

Recently came across several objects that needed to be purged from our Oracle DW. It seems after a drop table the object still remains in the RecycleBin.

Example to query for the objects:

select 'purge table "' || object_name || '" ;' , original_name, type, can_undrop as "UND", can_purge as "PUR", droptime from recyclebin

Example of purge:
purge table "BIN$3EgaHwhNbdPgQy8CKArrew==$0" ;

Using Oracle's recycle bin

Monday, June 10, 2013

SSAS MDX Query for Current Year, Current Month and Current Day

Recently I had a need to retrieve current year via an MDX query. Below are some very useful VBAMDX functions that will help with looking at current periods.

MDX for retrieving current Year;

WITH MEMBER [Measures].[Current Year] AS

VBAMDX.Format(VBAMDX.Now(),"yyyy")

SELECT

{[Measures].[Current Year]} ON COLUMNS

FROM [Adventure Works];

MDX for retrieving current Month;

WITH MEMBER [Measures].[Current Month] AS

VBAMDX.Format(VBAMDX.Now(),"MM")

SELECT

{[Measures].[Current Month]} ON COLUMNS

FROM [Adventure Works];

MDX for retrieving current Date;

WITH MEMBER [Measures].[Current Date] AS

VBAMDX.Format(VBAMDX.Now(),"dd")

SELECT

{[Measures].[Current Date]} ON COLUMNS

FROM [Adventure Works];

Sometimes user wants to retrieve data for current year, current month, and current day. You can retrieve data by using VBA function and STRTOMEMBER function.

Check following MDX samples by using Adventure Works sample.

MDX for retrieving current Year;

WITH

MEMBER [Measures].[Current Year Value] AS

(

StrToMember

(

"[Date].[Calendar].[Calendar Year].&["

+ VBAMDX.Format(VBAMDX.Now(),"yyyy") + "]"

)

,[Measures].[Internet Sales Amount]

)

SELECT

[Measures].[Current Year Value] ON COLUMNS

FROM [Adventure Works];

MDX for retrieving current Month;

WITH

MEMBER [Measures].[Current Month Value] AS

(

StrToMember

(

"[Date].[Calendar].[Month].&[" +

VBAMDX.Format(VBAMDX.Now(),"yyyy") + "]

&[" + VBAMDX.Format(VBAMDX.Now(),"MM") + "]"

)

,[Measures].[Internet Sales Amount]

)

SELECT

[Measures].[Current Month Value] ON COLUMNS

FROM [Adventure Works];

MDX for retrieving current Day;

WITH

MEMBER [Measures].[Today's Value] AS

(

StrToMember

(

"[Date].[Calendar].[Date].&[" +

VBAMDX.Format(VBAMDX.Now(),"yyyyMMdd")

+ "]"

)

,[Measures].[Internet Sales Amount]

)

SELECT

[Measures].[Today's Value] ON COLUMNS

FROM [Adventure Works];

Thursday, May 23, 2013

DimDate Dimension

If you are looking for a DimDate script for your EDW, this one may work for you.

IF OBJECT_ID('DimDate') IS NOT NULL
DROP TABLE DimDate
GO

CREATE TABLE [dbo].[DimDate]
(
[DateKey] [int] NOT NULL ,
[CalendarDate] [smalldatetime] NULL ,
[CalendarWeek] [varchar](8) NULL ,
CalendarWeekNumber INT ,
[CalendarMonth] [varchar](8) NULL ,
[CalendarMonthName] [varchar](30) NULL ,
CalendarMonthNumber INT ,
[CalendarQuarter] [varchar](8) NULL ,
[CalendarYear] [varchar](4) NULL ,
[MonthStart] [smalldatetime] NULL ,
[MonthEnd] [smalldatetime] NULL ,
WeekDay INT ,
WeekDayName VARCHAR(20) ,
WeekDayType VARCHAR(20)
CONSTRAINT [PK_DimDate] PRIMARY KEY CLUSTERED ( [DateKey] ASC )
)
ON [PRIMARY]
GO
CREATE UNIQUE INDEX IX_DimDate ON DimDate(CalendarDate)
GO

--======================================
-- Populate the dimension
--======================================
;
WITH DateCTE
AS ( SELECT CAST('1995-01-01' AS DATE) AS DateValue
UNION ALL
SELECT DATEADD(d, 1, DateValue)
FROM DateCTE
WHERE DATEADD(d, 1, DateValue) < '2015-01-01'
)
INSERT INTO DimDate
SELECT CAST(CONVERT(CHAR(8), CAST(DateValue AS DATETIME), 112) AS INT) AS DateKey ,
CAST(DateValue AS SMALLDATETIME) AS CalendarDate ,
CAST(YEAR(DateValue) AS CHAR(4)) + '-'
+ CAST(DATEPART(wk, DateValue) AS VARCHAR(2)) AS CalendarWeek ,
DATEPART(wk, DateValue) AS CalendarWeekNumber ,
CAST(YEAR(DateValue) AS CHAR(4)) + '-'
+ CASE WHEN DATEPART(m, DateValue) < 10
THEN '0' + CAST(DATEPART(m, DateValue) AS CHAR(4))
ELSE CAST(DATEPART(m, DateValue) AS CHAR(4))
END AS CalendarMonth ,
DATENAME(MONTH, DateValue) AS CalendarMonthName ,
DATEPART(m, DateValue) AS CalendarMonthNumber ,
CAST(YEAR(DateValue) AS CHAR(4))
+ CAST(DATEPART(Quarter, DateValue) AS CHAR(1)) AS CalendarQuarter ,
YEAR(DateValue) AS CalendarYear ,
CAST(CONVERT(VARCHAR(10), ( CONVERT(CHAR(6), CAST(DateValue AS DATETIME), 112)
+ '01' ), 120) AS SMALLDATETIME) AS CalendarMonthStart ,
CAST(CONVERT(VARCHAR(10), CAST(CONVERT(CHAR(8), CAST(CONVERT(VARCHAR(12), DATEADD(day,
-1
* DAY(DATEADD(month,
1, DateValue)),
DATEADD(month, 1,
DateValue)), 113) AS DATETIME), 112) AS INT), 120) AS SMALLDATETIME) AS CalendarMonthEnd ,
DATEPART(dw, DateValue) AS WeekDay ,
DATENAME(dw, DateValue) AS WeekDayName ,
CASE WHEN DATENAME(dw, DateValue) IN ( 'Saturday',
'Sunday' )
THEN 'WeekEnd'
ELSE 'WeekDay'
END AS WeekDayType
FROM DateCTE a
ORDER BY DateKey
OPTION ( MAXRECURSION 32000 )
GO

SELECT * FROM dbo.DimDate

Wednesday, May 22, 2013

Value of a Partitioned Table

Recently I came across some larger tables in our production CRM database and decided to do some testing on the value of adding a partition to an existing larger MS SQL table (177 million rows). At the end of the day as with everything in IS, it depends on how the application interacts with the database/table. In my example, I decided to use a yearly partition schema. This seems to be most common approach with larger tables. The below script will walk you through and directly compare the same query on a non-partitioned clustered index to a partitioned cluster index. If you review the pre/post execution plans you will notice that Execution I/O Cost has been greatly reduced with the use of a partition. I approximate around 98% reduction in IO cost. With additional testing on the production table, I feel the reduction in Execution I/O cost from a partition schema is something everyone should look into on larger tables.

-- check for table
IF OBJECT_ID('dbo.TestPartitionTable') IS NOT NULL
BEGIN
DROP TABLE dbo.TestPartitionTable
END
go

-- create table
CREATE TABLE [dbo].[TestPartitionTable]
(
[pkcol] [int] NOT NULL ,
[datacol1] [int] NULL ,
[datacol2] [int] NULL ,
[datacol3] [varchar](50) NULL ,
[partitioncol] DATETIME
)
GO

-- Populate table
DECLARE @val INT
SELECT @val = 1
WHILE @val < 1000001 -- million rows
BEGIN
INSERT INTO dbo.TestPartitionTable
( pkcol ,
datacol1 ,
datacol2 ,
datacol3 ,
partitioncol
)
VALUES ( @val ,
@val ,
@val ,
'TEST' ,
GETDATE() - ( @val * .01 )
)
SELECT @val = @val + 1
PRINT @val
END
GO

/*
-- recovery if needed
INSERT INTO dbo.TestPartitionTable
( pkcol ,
datacol1 ,
datacol2 ,
datacol3 ,
partitioncol
)
SELECT * FROM dbo.TestPartitionTableBackup
*/

-- added standard PK
ALTER TABLE dbo.TestPartitionTable ADD CONSTRAINT PK_TestPartitionTable PRIMARY KEY CLUSTERED (pkcol)
GO
CREATE NONCLUSTERED INDEX IX_TestPartitionTable_col2col3 ON dbo.TestPartitionTable (datacol1,datacol2)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON [PRIMARY]
GO

-- check plan performance prior to partition

-- Estimated I/O Cost
-- The estimated cost of all I/O activity for the operation. This value should be as low as possible
-- 4.50542
SELECT *
FROM TestPartitionTable
WHERE partitioncol >= '20130101'

-- check for
DROP PARTITION SCHEME myPartitionScheme
go
DROP PARTITION FUNCTION pfDateRange
go

-- create date range partition
CREATE PARTITION FUNCTION pfDateRange (DATETIME)
AS RANGE RIGHT FOR VALUES ('20110101', '20120101','20130101')
GO
CREATE PARTITION SCHEME myPartitionScheme
AS PARTITION pfDateRange ALL TO ([PRIMARY])
GO

-- check the partition schema and range
SELECT ps.name ,
pf.name ,
boundary_id ,
value
FROM sys.partition_schemes ps
INNER JOIN sys.partition_functions pf ON pf.function_id = ps.function_id
INNER JOIN sys.partition_range_values prf ON pf.function_id = prf.function_id

-- drop original cluster index
ALTER TABLE dbo.TestPartitionTable DROP CONSTRAINT PK_TestPartitionTable
GO
-- add noncluster index
ALTER TABLE dbo.TestPartitionTable ADD CONSTRAINT PK_TestPartitionTable PRIMARY KEY NONCLUSTERED (pkcol)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
-- add partition index
CREATE CLUSTERED INDEX IX_TestPartitionTable_partitioncol ON dbo.TestPartitionTable (partitioncol)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON myPartitionScheme(partitioncol)
GO

-- check parition size
SELECT t.name AS TableName ,
i.name AS IndexName ,
p.partition_number ,
p.partition_id ,
i.data_space_id ,
f.function_id ,
f.type_desc ,
r.boundary_id ,
r.value AS BoundaryValue
FROM sys.tables AS t
JOIN sys.indexes AS i ON t.object_id = i.object_id
JOIN sys.partitions AS p ON i.object_id = p.object_id
AND i.index_id = p.index_id
JOIN sys.partition_schemes AS s ON i.data_space_id = s.data_space_id
JOIN sys.partition_functions AS f ON s.function_id = f.function_id
LEFT JOIN sys.partition_range_values AS r ON f.function_id = r.function_id
AND r.boundary_id = p.partition_number
WHERE t.name = 'TestPartitionTable'
AND i.type <= 1
ORDER BY p.partition_number;

-- parition column info
SELECT t.object_id AS Object_ID ,
t.name AS TableName ,
ic.column_id AS PartitioningColumnID ,
c.name AS PartitioningColumnName
FROM sys.tables AS t
JOIN sys.indexes AS i ON t.object_id = i.object_id
JOIN sys.columns AS c ON t.object_id = c.object_id
JOIN sys.partition_schemes AS ps ON ps.data_space_id = i.data_space_id
JOIN sys.index_columns AS ic ON ic.object_id = i.object_id
AND ic.index_id = i.index_id
AND ic.partition_ordinal > 0
WHERE t.name = 'TestPartitionTable'
AND i.type <= 1
AND c.column_id = 1;

SELECT OBJECT_SCHEMA_NAME(i.object_id) AS [schema] ,
OBJECT_NAME(i.object_id) AS [object] ,
i.name AS [index] ,
s.name AS [partition_scheme]
FROM sys.indexes i
JOIN sys.partition_schemes s ON i.data_space_id = s.data_space_id

-- Estimated I/O Cost
-- The estimated cost of all I/O activity for the operation. This value should be as low as possible
-- from 4.50542 to 0.0717498
SELECT *
FROM TestPartitionTable
WHERE partitioncol >= '20130101'

-- reset if needed
--SELECT * INTO TestPartitionTableBackup FROM TestPartitionTable

Reference link

Thursday, May 2, 2013

Talen Data Integration "failed connection" to MS SqlServer 2008

As many others, I ran across the issue connecting to MS SqlServer2008 with Talend 5.03.R86507. I have outlined the key steps in getting the issue resolved below.

Make sure you have a Sql Auth Account
Make sure the Sql Server configured for Sql Auth (Mixed Auth)

Right click on the instance level in SSMS
Select Properties\Security and update to Sql Server and Windows Auth mode

Confirm you can log in with the sql account via SSMS

Depending on the domain, etc...

Make sure to go to Sql Server Configuration Manager

Sql Native Client (version) Configuration

select Client Protocols
Make sure TCP/IP and Named Pipes are "Enabled"
This is key, if this is not done, the connection will fail

Now open Talend and Creation New DB Connection

DB Type: Microsoft Sql Server
Login: the sql auth account you just logged in with via SSMS
Pass: same
Server: is the sql server name and instance if you are not using the default
Port: unless you have changed the port then use 1433
Database
schema
hit Check and you should get "server name connection successful"

After creating the connection, sometimes you need to close out of Talend and reopen to make sure the connection is usable in the repository.

Wednesday, May 1, 2013

TSQL vs PL/SQL

For the past 8+ years I have primarily been working the Microsoft stack and TSQL. Recently I started working more with Oracle PL/SQL and I wanted to keep track of some of the key differences that I come across. First I will start out with the definitions on each. TSQL is Microsoft and Sybase's proprietary extension to Structured Query Language. SQL is a computer language developed for querying, altering and defining relational databases using declarative statements. TSQL expands by adding procedural programming, local variables and various functions. PL/SQL is Oracles procedural extension language for SQL. Oracle databases have three key programming languages, PL/SQL, SQL and Java.

Over the next couple of weeks I will be updating this article with my findings. Some very simple but some a little more complex.

TSQL GetDate() is replaced by PL/SQL select sysdate from dual;

For those of you that don't know what dual is, its a dummy table

Update from sub select

TSQL: update s set s.value = k.value2 from dbo.SomeTable s inner join (select ID, value2 dbo.KTable) k on k.ID = s.ID
PL/SQL: update SomeTable s set(value) = (select value2 from KTable k where k.ID = s.ID)
as you can see the syntax is a little different but will accomplish the same thing

ISNULL is replaced with NVL
Top 10 is replaced with "where rownum <11 li="">
TSQL select @@version vs Oracle select * from v$version where banner like 'Oracle%';
More to come, please check back

References:

Saturday, March 23, 2013

TSQL Tips on Performance

I was researching some query performance tips and came across a great article by Grant Fritchey called "The Seven Sins against TSQL Performance". I would strongly recommend reading over the article. I have outlined a few things that caught my attention.

Functions in Comparisons within the ON or WHERE Clause
Employing Multi-Statement User Defined Functions (UDFs)
Applying Query Hints indiscriminately
Indulging in Nested Views

Reference Link:

Linked server security impact

Over the recent months I have seen a deprecation of performance on some of the linked server queries. I know I should not be using them but sometimes it is a must. At the same time we had been making some security changes on the primary OLTP db. So the question must be asked, Does it matter what the linked server account permission access is as long as it has select on the tables in the case? Well apparently it does.

It turns out that it needs the same permission as is required to execute DBCC SHOW_STATISTICS. Per Books Online, the permissions for DBCC SHOW_STATISTICS are:

User must own the table, or be a member of the sysadmin fixed server role, the db_owner fixed database role, or the db_ddladmin fixed database role.

Depending on the method (all records returned and join on the local or one row at a time) the linked server account will need access to the distribution statistics.

Whether it is reasonable to require sysadmin, db_owner, or db_ddladmin in order to process a distributed query efficiently is an issue for a different thread of discussion.

Update: Apparently after reading more, this was resolved in Sql 2012. So if you are running a Sql 2008R2 or less this is a valid concern.

Refernce

fsugeiger

Thursday, June 20, 2013

PL/SQL Purge Recycle Bin

Monday, June 10, 2013

SSAS MDX Query for Current Year, Current Month and Current Day

Thursday, May 23, 2013

DimDate Dimension

Wednesday, May 22, 2013

Value of a Partitioned Table

Thursday, May 2, 2013

Talen Data Integration "failed connection" to MS SqlServer 2008

Wednesday, May 1, 2013

TSQL vs PL/SQL

Saturday, March 23, 2013

TSQL Tips on Performance

Linked server security impact

Tags

Search This Blog

Blog Archive

Links

SharePoint Views

SQLAndy

Devin Knight