1 min

Perl regex extracting domain name from URL

http://stackoverflow.com/questions/11875630/how-do-i-get-the-host-name-from-a-url-in-perl-using-regex

use strict;
use warnings;

use URI::Split qw/ uri_split uri_join /;

my $scheme_host = do {
  my (@parts) = uri_split 'http://linux.pacific.net.au/primary.xml.gz';
  uri_join @parts[0,1];
};

print $scheme_host;

If cannot install module:

use strict;
use warnings;

my $url = 'http://linux.pacific.net.au/primary.xml.gz';

my ($scheme_host) = $url =~ m|^( .*?\. [^/]+ )|x;

print $scheme_host;

outputs: http://linux.pacific.net.au

http://stackoverflow.com/questions/2497215/extract-domain-name-from-url

#!/usr/bin/perl -w
use strict;

my $url = $ARGV[0];

if($url =~ /([^:]*:\/\/)?([^\/]+\.[^\/]+)/g) {
  print $2;
}

Usage:


./test.pl 'https://example.com'
example.com

./test.pl 'https://www.example.com/'
<a href="http://www.example.com">www.example.com</a>

./test.pl 'example.org/'
example.org

 ./test.pl 'example.org'
example.org

./test.pl 'example'  -> no output

"And if you just want the domain and not the full host + domain use this instead:"

#!/usr/bin/perl -w
use strict;

my $url = $ARGV[0];
if($url =~ /([^:]*:\/\/)?([^\/]*\.)*([^\/\.]+\.[^\/]+)/g) {
  print $3;
}

http://stackoverflow.com/questions/15627892/perl-regex-to-get-the-root-domain-of-a-url

$facebook = "www.facebook.com/xxxxxxxxxxx";

$facebook =~ s/www\.(.*\.com).*/$1/; # get what is between www. and .com

print $facebook;

"Returns"

facebook.com

"You may also want to make this work for .net, .org, etc. Something like:"

s/www\.(.*\.(?:net|org|com)).*/$1/;

http://www.perlmonks.org/?node_id=670802

http://www.willmaster.com/blog/perl/extracting-domain-name-from-url.php

"First, remove the http/https and possible www. from the front of the URL:"

$url =~ s!^https?://(?:www\.)?!!i;

"Then, strip off everything from the first "/" to the end of the URL (doing nothing if there is no "/"):"

$url =~ s!/.*!!;

"Last, in case the URL was (http://example.com?stuff) or (http://example.com#stuff) or (http://example.com:80/whatever), also strip off everything from the first "?" or "#" or ":", if present:"

$url =~ s/[\?\#\:].*//;

"The value of $url is now the domain name by itself."

#perl - #programming - #regex

From JR's : articles
238 words - 2268 chars - 1 min read
created on Oct 02, 2013 at 02:16:11 pm
updated on Oct 02, 2013 at 02:24:26 pm - #
source - versions

Related articles
Perl regex extracting domain name from URL code example - Oct 02, 2013
Perl Critic and Diag - Oct 09, 2013
Perl links to read - Jul 09, 2013
Installing a Perl Module - Jan 13, 2015
Perl and OAuth - Dec 03, 2013
more >>