PHP Getting Domain Name From Subdomain

  • Added:
  • |
  • In: Basic PHP

I need to write a function to parse variables which contain domain names. It's best I explain this with an example, the variable could contain any of these things:

But when passed through my function all of these must return either or, the root domain name basically. I'm sure I've done this before but I've been searching Google for about 20 minutes and can't find anything. Any help would be appreciated.

EDIT: Ignore the, presume that all domains going through this function have a 3 letter TLD.

This Question Has 21 Answeres | Orginal Question | zuk1
$full_domain = $_SERVER['SERVER_NAME'];
$just_domain = preg_replace("/^(.*\.)?([^.]*\..*)$/", "$2", $_SERVER['HTTP_HOST']);

No need for listing all the countries TLD, they are all 2 letters, besides the special ones listed by IANA

and the tests are here

Comprehensive test suit along with working code. The only caveat is that it won't work with unicode domain names, but that's another level of data extraction.

From the list, I'm testing against:

$urls = array(
'' => '',
'' => '',
'' => '',
'' => '',
'' => '',
'localhost' => 'localhost',
'www.localhost' => 'localhost',
'subdomain.localhost' => 'localhost',
'' => '',
'' => '',
'' => '',
'' => '',
'' => '',
'' => '',
'' => '',
'' => '',
'' => '',
'' => '',
'' => '',
'' => '',
'' => '',
'' => '',
'' => '',
'' => '',
'' => '',
'' => '',
'test' => 'test',
'www.test' => 'test',
'subdomain.test' => 'test',
'' => '',
'' => '',
'' => '',
'' => '',
'' => '',
'' => '',
'' => '',
'' => '',
'::1' => '::1',

I ended up using the database Mozilla has.

Here's my code:

fetch_mozilla_tlds.php contains caching algorhythm. This line is important:

$mozillaTlds = file('');

The main file used inside the application is this:

function isTopLevelDomain($domain)
    $domainParts = explode('.', $domain);
    if (count($domainParts) == 1) {
        return false;

    $previousDomainParts = $domainParts;

    $tld = implode('.', $previousDomainParts);

    return isDomainExtension($tld);

function isDomainExtension($domain)
    $tlds = getTLDs();

     * direct hit
    if (in_array($domain, $tlds)) {
        return true;

    if (in_array('!'. $domain, $tlds)) {
        return false;

    $domainParts = explode('.', $domain);

    if (count($domainParts) == 1) {
        return false;

    $previousDomainParts = $domainParts;

    array_unshift($previousDomainParts, '*');

    $wildcardDomain = implode('.', $previousDomainParts);

    return in_array($wildcardDomain, $tlds);

function getTLDs()
    static $mozillaTlds = array();

    if (empty($mozillaTlds)) {
        require 'fetch_mozilla_tlds.php';
        /* @var $mozillaTlds array */

    return $mozillaTlds;

The database has evolved and is now available at its own website -

Ah - if you just want to handle three character top level domains - then this code works:

// let's test the code works: these should all return
// , or
foreach ($domains as $domain) {

function testdomain($url) {
 if (preg_match('/^((.+)\.)?([A-Za-z][0-9A-Za-z\-]{1,63})\.([A-Za-z]{3})(\/.*)?$/',$url,$matches)) {
    print 'Domain is: '.$matches[3].'.'.$matches[4].'<br>'."\n";
 } else {
    print 'Domain not found in '.$url.'<br>'."\n";

$matches[1]/$matches[2] will contain any subdomain and/or protocol, $matches[3] contains the domain name, $matches[4] the top level domain and $matches[5] contains any other URL path information.

To match most common top level domains you could try changing it to:

if (preg_match('/^((.+)\.)?([A-Za-z][0-9A-Za-z\-]{1,63})\.([A-Za-z]{2,6})(\/.*)?$/',$url,$matches)) {

Or to get it coping with everything:

if (preg_match('/^((.+)\.)?([A-Za-z][0-9A-Za-z\-]{1,63})\.(co\.uk|me\.uk|org\.uk|com|org|net|int|eu)(\/.*)?$/',$url,$matches)) {

etc etc

I think your problem is that you haven't clearly defined what exactly you want the function to do. From your examples, you certainly don't want it to just blindly return the last two, or last three, components of the name, but just knowing what it shouldn't do isn't enough.

Here's my guess at what you really want: there are certain second-level domain names, like, that you'd like to be treated as a single TLD (top-level domain) for purposes of this function. In that case I'd suggest enumerating all such cases and putting them as keys into an associative array with dummy values, along with all the normal top-level domains like com., net., info., etc. Then whenever you get a new domain name, extract the last two components and see if the resulting string is in your array as a key. If not, extract just the last component and make sure that's in your array. (If even that isn't, it's not a valid domain name) Either way, whatever key you do find in the array, take that plus one more component off the end of the domain name, and you'll have your base domain.

You could, perhaps, make things a bit simpler by writing a function, instead of using an associative array, to tell whether the last two components should be treated as a single "effective TLD." The function would probably look at the next-to-last component and, if it's shorter than 3 characters, decide that it should be treated as part of the TLD.

This is a short way of accomplishing that:

$host = $_SERVER['HTTP_HOST'];
preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
echo "domain name is: {$matches[0]}\n";

I would do something like the following:

// hierarchical array of top level domains
$tlds = array(
    'com' => true,
    'uk' => array(
        'co' => true,
        // …
    // …
$domain = '';
// split domain
$parts = explode('.', $domain);
$tmp = $tlds;
// travers the tree in reverse order, from right to left
foreach (array_reverse($parts) as $key => $part) {
    if (isset($tmp[$part])) {
        $tmp = $tmp[$part];
    } else {
// build the result
var_dump(implode('.', array_slice($parts, - $key - 1)));

There are two ways to extract subdomain from a host:

  1. The first method that is more accurate is to use a database of tlds (like public_suffix_list.dat) and match domain with it. This is a little heavy in some cases. There are some PHP classes for using it like php-domain-parser and TLDExtract.

  2. The second way is not as accurate as the first one, but is very fast and it can give the correct answer in many case, I wrote this function for it:

    function get_domaininfo($url) {
        // regex can be replaced with parse_url
        preg_match("/^(https|http|ftp):\/\/(.*?)\//", "$url/" , $matches);
        $parts = explode(".", $matches[2]);
        $tld = array_pop($parts);
        $host = array_pop($parts);
        if ( strlen($tld) == 2 && strlen($host) <= 3 ) {
            $tld = "$host.$tld";
            $host = array_pop($parts);
        return array(
            'protocol' => $matches[1],
            'subdomain' => implode(".", $parts),
            'domain' => "$host.$tld",




        [protocol] => https
        [subdomain] => mysubdomain
        [domain] =>
        [host] => domain
        [tld] =>

As already said Public Suffix List is only one way to parse domain correctly. I recomend TLDExtract package, here is sample code:

$extract = new LayerShifter\TLDExtract\Extract();

$result = $extract->parse('');
$result->getSubdomain(); // will return (string) 'here'
$result->getHostname(); // will return (string) 'example'
$result->getSuffix(); // will return (string) 'com'

Building on Jonathan's answer:

function main_domain($domain) {
  if (preg_match('/([a-z0-9][a-z0-9\-]{1,63})\.([a-z]{3}|[a-z]{2}\.[a-z]{2})$/i', $domain, $regs)) {
    return $regs;

  return false;

His expression might be a bit better, but this interface seems more like what you're describing.

Stackoverflow Question Archive:

print get_domain(""); // outputs ''

function get_domain($url)
  $pieces = parse_url($url);
  $domain = isset($pieces['host']) ? $pieces['host'] : '';
  if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
    return $regs['domain'];
  return false;

Based on

function find_tld($url){

$purl  = parse_url($url);
$host  = strtolower($purl['host']);

$valid_tlds = " .ac.yu .co.yu .org.yu .edu.yu .ac .ad .ae .aero .af .ag .ai .al .am .an .ao .aq .ar .arpa .as .at .au .aw .az .ba .bb .bd .be .bf .bg .bh .bi .biz .bj .bm .bn .bo .br .bs .bt .bv .bw .by .bz .ca .cat .cc .cd .cf .cg .ch .ci .ck .cl .cm .cn .co .com .coop .cr .cu .cv .cx .cy .cz .de .dj .dk .dm .do .dz .ec .edu .ee .eg .er .es .et .eu .fi .fj .fk .fm .fo .fr .ga .gb .gd .ge .gf .gg .gh .gi .gl .gm .gov .gp .gq .gr .gs .gt .gu .gw .gy .hk .hm .hn .hr .ht .hu .id .ie .il .im .in .info .int .io .iq .ir .is .it .je .jm .jo .jobs .jp .ke .kg .kh .ki .km .kn .kr .kw .ky .kz .la .lb .lc .li .lk .lr .ls .lt .lu .lv .ly .ma .mc .md .mg .mh .mil .mk .ml .mm .mn .mo .mobi .mp .mq .mr .ms .mt .mu .museum .mv .mw .na .name .nc .ne .net .nf .ng .ni .nl .no .np .nr .nu .nz .om .org .pa .pe .pf .pg .ph .pk .pl .pm .pn .post .pr .pro .ps .pt .pw .py .qa .re .ro .ru .rw .sa .sb .sc .sd .se .sg .sh .si .sj .sk .sl .sm .sn .so .sr .st .su .sv .sy .sz .tc .td .tf .tg .th .tj .tk .tl .tm .tn .to .tp .tr .travel .tt .tv .tw .tz .ua .ug .uk .um .us .uy .uz .va .vc .ve .vg .vi .vn .vuwf .ye .yt .yu .za .zm .zw .ca .cd .ch .cn .cu .cx .dm .dz .ec .ee .es .fr .ge .gg .gi .gr .hk .hn .hr .ht .hu .ie .in .ir .it .je .jo .jp .kr .ky .li .lk .lt .lu .lv .ly .ma .mc .mg .mk .mo .mt .mu .nl .no .nr .nr .pf .ph .pk .pl .pr .ps .pt .ro .ru .rw .sc .sd .se .sg .tj .to .to .tt .tv .tw .tw .tw .tw .ua .ug .us .vi .vn";

    $tld_regex = '#(.*?)([^.]+)('.str_replace(array('.',' '),array('\\.','|'),$valid_tlds).')$#';

    //remove the extension

    if(!empty($matches) && sizeof($matches) > 2){
        $extension = array_pop($matches);
        $tld = array_pop($matches);
        return $tld.$extension;

    }else{ //change to "false" if you prefer
        return $host;


Here is what I am using: It works great without needing any arrays for tld's

$split = array_reverse(explode(".", $_SERVER['HTTP_HOST']));
$domain = $split[1].".".$split[0];

    if(gethostbyname($domain) != $_SERVER['SERVER_ADDR'] && isset($split[2]))
        $domain = $split[2].".".$split[1].".".$split[0];

This isn't foolproof and should only really be used if you know the domain isn't going to be anything obscure, but it's easier to read than most of the other options:

$justDomain = $_SERVER['SERVER_NAME'];
switch(substr_count($justDomain, '.')) {
    case 1:
        // 2 parts. Must not be a subdomain. Do nothing.

    case 2:
        // 3 parts. Either a subdomain or a 2-part suffix
        // If the 2nd part is over 3 chars's, assume it to be the main domain part which means we have a subdomain.
        // This isn't foolproof, but should be ok for most domains.
        // Something like would cause problems, though. As would
        $parts = explode('.', $justDomain);
        if(strlen($parts[1]) > 3) {
            $justDomain = implode('.', $parts);

        // 4+ parts. Must be a subdomain.
        $parts = explode('.', $justDomain, 2);
        $justDomain = $parts[1];

// $justDomain should now exclude any subdomain part.

Here is how you strip the TLD from any URL - I wrote the code to work on my site: - This is a working solution that is used on my site.

$host is the URL that has to be parsed. This code is a simple solution and reliable
compared to everything else I have seen, It works on any URL that I have tried!!!
see this code parsing the page you are looking at right now!


$host = filter_var($_GET['dns']);
$host = $host . '/'; // needed if URL does not have trailing slash

// Strip www, http, https header ;

$host = str_replace( 'http://www.' , '' , $host );
$host = str_replace( 'https://www.' , '' , $host );

$host = str_replace( 'http://' , '' , $host );
$host = str_replace( 'https://' , '' , $host );
$pos = strpos($host, '/'); // find any sub directories
$host = substr( $host, 0, $pos );  //strip directories

$hostArray = explode (".", $host); // count parts of TLD
$size = count ($hostArray) -1; // really only need to know if not a single level TLD
$tld = $hostArray[$size]; // do we need to parse the TLD any further - 
                          // remove subdomains?

if ($size > 1) {
    if ($tld == "aero" or $tld == "asia" or $tld == "biz" or $tld == "cat" or
        $tld == "com" or $tld == "coop" or $tld == "edu" or $tld == "gov" or
        $tld == "info" or $tld == "int" or $tld == "jobs" or $tld == "me" or
        $tld == "mil" or $tld == "mobi" or $tld == "museum" or $tld == "name" or
        $tld == "net" or $tld == "org" or $tld == "pro" or $tld == "tel" or
        $tld == "travel" or $tld == "tv" or $tld == "ws" or $tld == "XXX") {

        $host = $hostArray[$size -1].".".$hostArray[$size]; // parse to 2 level TLD
    } else {
         // parse to 3 level TLD
        $host = $hostArray[$size -2].".".$hostArray[$size -1].".".$hostArray[$size] ;

Regex could help you out there. Try something like this:


This script generates a Perl file containing a single function, get_domain from the ETLD file. So say you have hostnames like img1, img2, img3, ... in For each of those get_domain $host would return Note that this isn't the fastest function on earth, so in my main log parser that's using this, I keep a hash of host to domain mappings and only run this for hosts that aren't in the hash yet.


cat << 'EOT' >

sub get_domain {
  $_ = shift;

wget -O - \
  | iconv -c -f UTF-8 -t ASCII//TRANSLIT \
  | egrep -v '/|^$' \
  | sed -e 's/^\!//' -e "s/\"/'/g" \
  | awk '{ print length($0),$0 | "sort -rn"}' | cut -d" " -f2- \
  | while read SUFF; do
      STAR=`echo $SUFF | cut -b1`
      if [ "$STAR" = '*' ]; then
        SUFF=`echo $SUFF | cut -b3-`
        echo "  return \"\$1\.\$2\.$SUFF\" if /([a-zA-Z0-9\-]+)\.([a-zA-Z0-9\-]+)\.$SUFF\$/;"
        echo "  return \"\$1\.$SUFF\" if /([a-zA-Z0-9\-]+)\.$SUFF\$/;"
    done >>

cat << 'EOT' >>


Almost certainly, what you're looking for is this:

It's a PHP library that utilizes the (as nearly as is practical) full list of various TLD's that's collected at , and wraps it up in a spiffy little function.

Once the library is included, it's as easy as:

$registeredDomain = getRegisteredDomain( $domain );

As a variant to Jonathan Sampson

function get_domain($url)   {   
    if ( !preg_match("/^http/", $url) )
        $url = 'http://' . $url;
    if ( $url[strlen($url)-1] != '/' )
        $url .= '/';
    $pieces = parse_url($url);
    $domain = isset($pieces['host']) ? $pieces['host'] : ''; 
    if ( preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs) ) { 
        $res = preg_replace('/^www\./', '', $regs['domain'] );
        return $res;
    return false;

To do it well, you'll need a list of the second level domains and top level domains and build an appropriate regular expression list. A good list of second level domains is available at Another test case apart from the aforementioned CentralNic variants is The Vatican: their website is technically at http://va : and that's a difficult one to match on!

It is not possible without using a TLD list to compare with as their exist many cases like or

But even with that you won't have success in every case because of SLD's like or

If you need a complete list you can use the public suffix list:

Feel free to use my function. It won't use regex and it is fast:

Meet with owner

Sajjad Hossain

Hey, I am Sajjad, working in web development sector since 2012. I love to do amazing things. Let's do a project together.
Connect Social With PHPAns