Perl regex extracting domain name from URL
use strict;
use warnings;
use URI::Split qw/ uri_split uri_join /;
my $scheme_host = do {
my (@parts) = uri_split 'http://linux.pacific.net.au/primary.xml.gz';
uri_join @parts[0,1];
};
print $scheme_host;
If cannot install module:
use strict;
use warnings;
my $url = 'http://linux.pacific.net.au/primary.xml.gz';
my ($scheme_host) = $url =~ m|^( .*?\. [^/]+ )|x;
print $scheme_host;
outputs: http://linux.pacific.net.au
http://stackoverflow.com/questions/2497215/extract-domain-name-from-url
#!/usr/bin/perl -w
use strict;
my $url = $ARGV[0];
if($url =~ /([^:]*:\/\/)?([^\/]+\.[^\/]+)/g) {
print $2;
}
Usage:
./test.pl 'https://example.com'
example.com
./test.pl 'https://www.example.com/'
<a href="http://www.example.com">www.example.com</a>
./test.pl 'example.org/'
example.org
./test.pl 'example.org'
example.org
./test.pl 'example' -> no output
"And if you just want the domain and not the full host + domain use this instead:"
#!/usr/bin/perl -w
use strict;
my $url = $ARGV[0];
if($url =~ /([^:]*:\/\/)?([^\/]*\.)*([^\/\.]+\.[^\/]+)/g) {
print $3;
}
http://stackoverflow.com/questions/15627892/perl-regex-to-get-the-root-domain-of-a-url
$facebook = "www.facebook.com/xxxxxxxxxxx";
$facebook =~ s/www\.(.*\.com).*/$1/; # get what is between www. and .com
print $facebook;
"Returns"
facebook.com
"You may also want to make this work for .net, .org, etc. Something like:"
s/www\.(.*\.(?:net|org|com)).*/$1/;
http://www.perlmonks.org/?node_id=670802
http://www.willmaster.com/blog/perl/extracting-domain-name-from-url.php
"First, remove the http/https and possible www. from the front of the URL:"
$url =~ s!^https?://(?:www\.)?!!i;
"Then, strip off everything from the first "/" to the end of the URL (doing nothing if there is no "/"):"
$url =~ s!/.*!!;
"Last, in case the URL was (http://example.com?stuff) or (http://example.com#stuff) or (http://example.com:80/whatever), also strip off everything from the first "?" or "#" or ":", if present:"
$url =~ s/[\?\#\:].*//;
"The value of $url is now the domain name by itself."
#perl - #programming - #regex
From JR's : articles
238 words - 2268 chars
- 1 min read
created on
updated on
- #
source
- versions
Related articles
Perl regex extracting domain name from URL code example - Oct 02, 2013
Using Perl and Erlang - Jan 19, 2015
Rest and Json links to keep around - Oct 10, 2013
Perl Dancer Framework - Dec 19, 2013
Add HTML Scrubber to Junco - Oct 01, 2013
more >>