Kapil Dave

Friday, July 30, 2010

PHP Developers

Over the years I've observed that the least attentional programmers in industry are the PHP programmers. They are just simply unaware. Their domain is Web, but they are almost blind about what's happening in world around Web. Go and ask a .NET developer, he will utter out all the changes that happened since last few versions on framework and he will also mention the anticipated changes in next version. While I'm not a great fan of .NET developers' understanding of nitty-gritty of technologies I must say - they know what they need to! And PHP developers? Well, they just don't know anything, neither they know that what to know. They would start working on any project without even understanding it, let alone giving attention to the detail - as if they are assigned to dig a hole in said time.

The poorness of development methodology of PHP programmers is second only to VB programmers. Bad programming approach can easily be seen in most of the core developments. Hack in opensource is regarded as skill.

People talk about PHP not being scalable as compared to .NET or Java. What is scalability? Scalability depends hugely on the architecture of the system and less on the platform. While .NET and Java have their inbuilt frameworks, we avoid frameworks. PHP 5 has pretty good support for OOPS and there is no reason why it should not be scalable. If it was not scalable, how Joomla is so scalable that it can meet almost any kind of requirement? Even Drupal achieves this aspect without being OOPS (however in not so pleasant way). So it's the PHP developers who let their language down.

There are reasons for this. A PHP developer is almost always on red alert state because most PHP projects are Project Based and not Prodcut Based. Industry requires output, and in PHP - fast output. There are few finance related issues in PHP platform that I would like to avoid here. But none of them should stop programmers from evolving.

Kapil Dave.

Tuesday, April 6, 2010

Why to And How to use Denormalization

Please read this post first if If you have come directly on this post.

So now as we know the types of Denormalization let's discuss why to and how to use them.

First of all - why to use Denormalization?

For the ideologists Denormalization would be a bad approach, but in practice you would have to use it in one or another way.

For three reasons

1). Performance

This would be the biggest reason to use denormalization. Denormalization significantly reduces the query run time by reducing the number of joins. Understand that each join statement does multiplication of the number of rows of each joining tables. Imagine a join statement between 3 tables that contain millions of records each!! How cool if you can avoid refering at least one of the two tables from the master table? Just have the required value lieing in table 3 in the master table itself and you are saved from referring to that table!! However a great amount of wisdom is required to do this. More to come on this later.

2). Database readability

Not as big reason as performance but sometimes you don't want your database to be very complex, specially in smaller projects. I've seen "Half Technical" clients wanting to be able to access the database without much exercise.

3). Ease of coding

You are saved from writing huge joins for doing simple operations with your denormalized database. Actually I don't know why I'm writing this. Who cares for poor programmers in this cruel world!! They are supposed to be supermen with no feeling!!

Now I will explain "How" to use the denormalization.

So from where to start? The first thing you need to know is "Do not create a denormalized database directly". To do denormalization you should always create a conventional normalized database first. This will give you clear idea of what type and to what extent you actually need denormalization. Remember, Denormalization is not simply opposite to Normalization but it's the next step to the normalization.

1). Reference Centralization Method

As I mentioned in my earlier post sometimes you see some really unnecessary normalization done on database. I would recommend to wisely avoid that. As a DB admin you have to take wise decision between the performance and maintainability. There is no harm to do denormalization for performance if a simple update statement can solve your maintainability problem. More often than not it is wise to not to use over normalized database. I've seen databases which had separate tables even for phone numbers and fax numbers!! Upon asking they gave me a very unconvincing reason but what it did was very painful for me. I had to write huge joins for very simple operations. Not required to mentioned that it slowed down that huge database a great deal.

2). CSV Makeshift Denormalization

This style is recommended if you are not going to need to do join using the reference values. The decision should be visionary otherwise it may cause stay backs for you in office and "enjoy" the company sponsored food!!

3). Scattered Denormalization

Think of Google Buzz here. They list the post description along with the user's name. Think how ofthen they have to do joins on post and user table considering all the conversation threads are real time Ajax driven. And also think how big the user table and post table would be. Now think how cool it would be if they had put the user's name along with the user's ID in the post table itself. They would not have to join to the user table at all!! This might look cool but this has to be a very wise decision of an experienced man. In this case Buzz has to update all the post records whenever user updates his name in the profile. I'm sure Google would not have done this. I would do this only if I have to run the said query VERY VERY often in the system and my hardware resources are very very poor in comparison to Google's.

Types of Denormalization of database

We all know what is normalization of the database and many of you know what is denormalization as well. What this post is meant for is to define the methods of the denormalization.

There is very little theoretical info available on the web on denormalization despite it being widely used. In fact every programmer would have used denormalization in one way or other in many projects. So since I could not find any theoretical explaination of it I would try to define the types and their prescribed useage myself.

So children, let's start the "Denormalization Class".

There are three types of denormalizations (Please forgive my poor vocabulary if you find the names of the types not much convincing. Suggestions are always welcomed!).

1). Reference Centralization Method

I bet every programmer would have used this method in each database driven project that they have worked on. In this method you simply avoid unnecessary normalization on your database. An example would explain this better.

In a User table you would want to have references to the address and phone numbers. Normally you would simply add address and phone fields in the user table. Believe it or not this is "Reference Centralization Method". If you wanted to normalize it you would need to have address table reference ID in the user table, then phone number ID in the Addresses table to reference the Phones table and so on. So in short, when you decide to not to have an over normalized table - and have some fixed fields instead - you are using Reference Centralization Method.

One can argue that it cannot be termed "Denormalization" just because it's not normalization. But hey, I am here to classify the types of denormalization and I can clearly see this as one of the types. So what if it's so commonly used? I cannot ignore it.

See this link to know when to use it.

2). CSV Makeshift Denormalization

Sometimes you prefer to not to create an association table to link two tables, you put CSV values in one of the master tables just to keep it simple. This method is CSV Makeshift Denormalization.

See this link to know when to use it.

3). Scattered denormalization

Hmm. This one is interesting. This is completely opposite to the ideology of normalization. In this you store a single set of information in multiple places. For example instead of having just User_ID in the Blog_Description table you would store User_Name, User_Rating along with User_ID in parallel to the User table. Thus you have to update the Blog_Description table each time you update the User table. It may look weird but it's actually useful in some cases.

See this link to know when to use it.

Anymore type you can think of? Please suggest.

To keep the post smaller I will write another post to prescribe when to and how to use different types of Denormalizations.

Friday, March 19, 2010

Array in MySql stored procedures

Hi guyz,

I saw a weird short coming in MySql stored procedures today.
It doesn't allow to pass in or pass out arrays.
The reason (I would say unreasonable reason) is that they only support variable types that they support for the data types in database storage. May be the architecture of MySql limits them. I'm not an expert of database engine architectures by any means but if other databases can offer, MySql MUST be able to offer this facility.

The alternative for this (as explained in MySql's developer forum) to create a temporary table in which you can store the output (or input) of the stored procedure and then you can fetch them.

Now, why would I do this? As for performance optimization purpose I would want to keep the database calls by PHP to MySql in minimum number. So when I want to make multiple fetch operation for a single task I would use stored procedure. But if I have to redirect my output in an another table I will have to make a PHP call to fetch the result from the tmp table anyways!! This would kill it's very reason.

You can pass CSV in string variable in place of a single dimension array, but what about multi dimension array? 9 out of 10 times you would need multi dimension array in practical scenario.

So what is the solution? Sadly, nothing.
Suddenly an alternative blinked to me and I tried it.
I simply furnished the output array in serialized format manually from the stored procedure.
Since the array became a string it was easy to pass to PHP, and unserialize an array in PHP is a cake walk!!

Try passing the following string from a stored procedure to PHP

a:2:{i:0;a:4:{s:5:"title";s:29:"MySql does not support arrays";s:11:"description";s:47:"Strange problem!! MySql does not support arrays";s:11:"website_url";s:20:"kapsdave.blogger.com";s:6:"number";i:1;}i:1;a:4:{s:5:"title";s:50:"How to pass out array in stored procedure in MySql";s:11:"description";s:35:"You can do this using serialization";s:11:"website_url";s:20:"kapsdave.blogger.com";s:6:"number";i:1;}}Array ( [0] => Array ( [title] => MySql does not support arrays [description] => Strange problem!! MySql does not support arrays [website_url] => kapsdave.blogger.com [number] => 1 ) [1] => Array ( [title] => How to pass out array in stored procedure in MySql [description] => You can do this using serialization [website_url] => kapsdave.blogger.com [number] => 1 ) )

And then unserialize the above text.
You will get an array similar to the following.

Array ( [0] => Array ( [title] => MySql does not support arrays [description] => Strange problem!! MySql does not support arrays [website_url] => kapsdave.blogger.com [number] => 1 ) [1] => Array ( [title] => How to pass out array in stored procedure in MySql [description] => You can do this using serialization [website_url] => kapsdave.blogger.com [number] => 1 ) )

I'm not sure if anybody has tried this before but I could not find anybody mentioning this on web so I finally decided to write a blog post.

Hope this helps!!

Thursday, January 14, 2010

XSLT Using PHP

Hello,

PHP being a ridiculously easy web scripting language, can be used (and is used) as a great front end tool for any high scale application. In many cases you need to have a back end of Java, C++ etc. but it may not be a great idea to create front end (i.e. web end) using those languages. One of the way to combine these back end with front end would be the use of XSLT transformation.

Your back end can create an XML. Using PHP-XSLT you can create the HTML front end for this application. The original XML will content the variables that you want to show on the front end. XSLT - being a transformation language - would convert the XML to HTML. All three (XML, XSLT and HTML) being markup languages, this process is easy and fast too.

While using Java and C++ are such platforms that will require such transformation for front end, I have used it for .Net. I know .Net already has ASP.Net which can do this but in our requirement, we had to do the transformation from VB.Net to PHP. No matter how weird it may sound, but it was a genuine requirement mainly because of the legacy of the program. I would discuss it in detail sometime.

To use PHP-XSLT transformation refer this link - http://atom.dosimple.ch/docs/html/d1/d3/xslt_8class_8php-source.html