Skip to content
Advertisement

Rewrite subdomain and URL-path to URL parameters but allow access to files

I’m struggling with my .htaccess file and setting it up the way I want it. The main function is a website that gets the language from the subdomain and the current page from the subfolders.

Requirements

I have three requirements that I need my .htaccess file to do;

  1. Wildcard subdomain redirected to lang variable
  2. Subfolder(s) redirected to page variable
  3. Local files respected (this is where I’m stuck)
  4. (Bonus) Split up the page variable into segments for each slash; page, sub1, sub2, etc

Examples

  • en.example.com/hello -> /index.php?lang=en&page=hello
  • es.example.com/hola -> /index.php?lang=es&page=hola
  • (Bonus) en.example.com/hello/there/sir -> index.php?lang=en&page=hello&sub1=there&sub2=sir

My current .htaccess

This is my current setup which actually kinda works, if I don’t need any local files (lol). This means local images aren’t found when my .htaccess below is active. I tried adding RewriteCond %{REQUEST_FILENAME} !-f to respect local files but that breaks the whole file it seems – and I don’t know why.

RewriteCond %{REQUEST_URI} ^/$
RewriteCond %{HTTP_HOST} ((?!www).+).example.com [NC]
RewriteRule ^$ /index.php?lang=%1 [L]

RewriteCond %{HTTP_HOST} ((?!www).+).example.com [NC]
RewriteRule ^(.+)$ /index.php?lang=%1&page=$1 [L]

RewriteRule ^index.php$ - [L]

RewriteRule ^(.*)$ /index.php?page=$1 [L,QSA]

Advertisement

Answer

If your URLs don’t contain dots then exclude dots from your regex – this naturally excludes real files (that contain a dot before the file extension). This avoids the need for a filesystem check.

Your script should handle /index.php?lang=%1 and /index.php?lang=%1&page= exactly the same, so the first rule is superfluous.

RewriteRule ^index.php$ - [L]

This rule should be first, not embedded in the middle.

Try the following instead:

RewriteRule ^index.php$ - [L]

RewriteCond %{HTTP_HOST} ^((?!www).+).example.com [NC]
RewriteRule ^([^.]*)$ /index.php?lang=%1&page=$1 [QSA,L]

RewriteRule ^([^.]*)$ /index.php?page=$1 [QSA,L]

Your last rule that rewrites everything else to index.php, less the lang URL param is questionable. Why not just include this in the preceding rule and validate the language in your script? Which you need to do anyway.

Assuming there is always a subdomain, then your rules could then be reduced to:

RewriteRule ^index.php$ - [L]

RewriteCond %{HTTP_HOST} ^(.+).example.com [NC]
RewriteRule ^([^.]*)$ /index.php?lang=%1&page=$1 [QSA,L]

Requests for the www language are then validated by your script and defaulted accordingly, as if the lang param was not passed at all (which you need to be doing anyway).

If your subdomain is entirely optional and you are accessing the domain apex then make it optional (with a non-capturing group) in the regex:

RewriteCond %{HTTP_HOST} ^(?:(.+).)?example.com [NC]
:

The lang param would then be empty if the domain apex was requested.

(Bonus) en.domain.com/hello/there/sir -> index.php?lang=en&page=hello&sub1=there&sub2=sir

It would be preferable (more efficient, flexible, etc) to do this in your PHP script, not .htaccess.

But in .htaccess you could do something like this (instead of the existing rule):

:
RewriteRule ^([^/.]*)(?:/([^/.]+))?(?:/([^/.]+))?(?:/([^/.]+))?(?:/([^/.]+))?$ /index.php?lang=%1&page=$1&sub1=$2&sub2=$3&sub3=$4&sub4=$5 [QSA,L]

The URL params are empty when that path segment is not present.

It is assumed the URL-path does not end in a slash (the above will not match if it does, so a 404 will result). If a trailing slash needs to be permitted then this should be implemented as a canonical redirect to remove the trailing slash. Or reverse the logic to enforce a trailing slash.

This particular example allows up to 4 additional “sub” path segments, eg. hello/1/2/3/4. You can extend this method to allow up to 8 (since there is a limit of 9 backreferences in the Apache syntax) if required. Any more and you will need to use PHP. (You could potentially handle more using .htaccess, but it will get very messy as you will need to employ additional conditions to capture subsequent path segments.)


I tried adding RewriteCond %{REQUEST_FILENAME} !-f to respect local files but that breaks the whole file it seems

That should also be sufficient (if dots are permitted in your URLs). But I wonder where you were putting it? It should not “break” anything – it simply prevents the rule from being processed if the request does map to a file – the rule is “ignored”.

This is of course assuming you are correctly linking to your resources/static assets using root-relative (starting with a slash) or absolute (starting with scheme + hostname) URLs. If you are using relative URLs then they will probably result in 404s. If this is the case then see my answer to the following question on the Webmasters stack:

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement