You're viewing old version number 1. - Current version

Perl regex extracting domain name from URL

http://stackoverflow.com/questions/11875630/how-do-i-get-the-host-name-from-a-url-in-perl-using-regex

use strict;
use warnings;

use URI::Split qw/ uri_split uri_join /;

my $scheme_host = do {
  my (@parts) = uri_split 'http://linux.pacific.net.au/primary.xml.gz';
  uri_join @parts[0,1];
};

print $scheme_host;

If cannot install module:

use strict;
use warnings;

my $url = 'http://linux.pacific.net.au/primary.xml.gz';

my ($scheme_host) = $url =~ m|^( .*?\. [^/]+ )|x;

print $scheme_host;

outputs: http://linux.pacific.net.au

http://stackoverflow.com/questions/2497215/extract-domain-name-from-url

#!/usr/bin/perl -w
use strict;

my $url = $ARGV[0];

if($url =~ /([^:]*:\/\/)?([^\/]+\.[^\/]+)/g) {
  print $2;
}

Usage:


./test.pl 'https://example.com'
example.com

./test.pl 'https://www.example.com/'
<a href="http://www.example.com">www.example.com</a>

./test.pl 'example.org/'
example.org

 ./test.pl 'example.org'
example.org

./test.pl 'example'  -> no output

"And if you just want the domain and not the full host + domain use this instead:"

#!/usr/bin/perl -w
use strict;

my $url = $ARGV[0];
if($url =~ /([^:]*:\/\/)?([^\/]*\.)*([^\/\.]+\.[^\/]+)/g) {
  print $3;
}

http://stackoverflow.com/questions/15627892/perl-regex-to-get-the-root-domain-of-a-url

$facebook = "www.facebook.com/xxxxxxxxxxx";

$facebook =~ s/www\.(.*\.com).*/$1/; # get what is between www. and .com

print $facebook;

"Returns"

facebook.com

"You may also want to make this work for .net, .org, etc. Something like:"

s/www\.(.*\.(?:net|org|com)).*/$1/;

#perl - #programming - #regex

From JR's : articles
157 words - 1619 chars
created on Oct 02, 2013 at 02:16:11 pm - #
source - versions

Related articles
Perl regex extracting domain name from URL code example - Oct 02, 2013
Perl regex extracting domain name from URL - Oct 02, 2013
Installing a Perl Module - Jan 13, 2015
Perl Dancer Framework - Dec 19, 2013
Interview with Mojo Mail author - November 2002 - Jul 18, 2014
more >>