Atlantic.Net AI Content & Crawler Policy
Last Updated: August 26, 2025
Introduction
At Atlantic.Net, we support the responsible development and use of artificial intelligence (AI). This policy outlines the rules for accessing and using content from atlantic.net and its subdomains for AI-related purposes, including model training, retrieval-augmented generation (RAG), and content indexing.
Our goal is to enable innovation while protecting our intellectual property, our customers, and the integrity of our services.
This policy applies to:
- Automated agents, AI crawlers, and scrapers are accessing our web properties.
- Developers and organizations using Atlantic.Net content in any AI system or workflow.
For specific technical directives, please consult these resources:
- https://www.atlantic.net/robots.txt: Standard crawling rules.
- https://www.atlantic.net/llms.txt: AI-specific usage permissions and preferred content.
- https://www.atlantic.net/sitemap-llms.xml: A sitemap of content mirrored in Markdown for cleaner ingestion.
1. Quick Start for AI Developers & Crawlers
To ensure compliant access, please follow these core principles:
- Respect robots.txt:Always adhere to the directives in our robots.txt file.
- Prioritize Markdown:For cleaner data ingestion, use the Markdown mirror of a page when available (e.g., use /gpu-server-hosting.md for /gpu-server-hosting/). A full list is in /sitemap-llms.xml.
- Follow llms.txt:Use the links in our llms.txt file to identify the canonical pages we have designated for AI use.
- Attribute Correctly:When you display or summarize our content, you must link back to https://www.atlantic.net and also to the specific source page.
- Observe Rate Limits:Do not exceed an average of 1 request per second per IP (see Technical Guidelines for bursting rules).
2. Definitions
- AI System:Any model or application that generates, summarizes, translates, classifies, or retrieves text or media.
- Training:Using our content to train, fine-tune, or otherwise adjust the parameters of a machine learning model.
- Inference / Retrieval:Using our content at query time to generate answers, provide summaries, or ground responses in factual data (e.g., RAG).
- Cache:A temporary local store of our content (e.g., embeddings, vector databases, HTML/Markdown copies) to enable faster processing.
3. Permitted Uses (Inference & Retrieval)
You are permitted to use our content for the following inference-oriented purposes:
- Indexing and Summarization:Crawl and create summaries of public-facing pages listed in our sitemaps and /llms.txt.
- Quotation:Quote short excerpts (up to 200 words) with clear attribution and a direct, functional link to the canonical source page.
- Retrieval-Augmented Generation (RAG):Use our content to ground AI-generated answers in fact, and provide a clear citation link to the source page is displayed with the output.
- Factual Comparisons:Create accurate, non-misleading comparisons (e.g., feature tables, service checklists) that reflect information published on our site.
4. Prohibited Uses (Requires a Commercial License)
The following activities are strictly prohibited
without prior written consent
and a formal license from Atlantic.Net:
- Model Training:Using any content from our sites for training, pre-training, fine-tuning, or distilling any AI model.
- Republishing:Republishing substantial portions of our content verbatim or near-verbatim.
- Removing Attribution:Obscuring, removing, or altering our logos, trademarks, certification marks, or copyright notices.
- Misrepresenting Our Services:Generating output that misrepresents our certifications (e.g., HIPAA, SOC 2, PCI DSS), implies endorsement where none exists, or makes false claims about our products.
- Data Harvesting:Collecting data from non-public or interactive sections of our site or attempting to access user data.
- Bypassing Controls:Circumventing any technical controls, including authentication requirements, robots.txt Disallow rules, or rate limits.
To request a license for training or other commercial uses, please contact
5. Technical Guidelines for Crawlers
- User-Agent:Identify your crawler with a descriptive User-Agent string that includes a contact URL or email address.
- Rate Limits:The default limit is 1 request/second per IP, with bursts of up to 5 rps for no more than 60 seconds.
- Backoff Signals:You must respect HTTP 429 (Too Many Requests) and HTTP 503 (Service Unavailable) responses and honor the Retry-After header.
- Efficient Crawling:Use If-Modified-Since and ETag headers to avoid re-downloading unchanged content. Avoid crawling search endpoints or large media files unless essential.
6. Compliance & Regulated Data
Atlantic.Net provides services under strict compliance frameworks, including HIPAA, HITECH, SSAE 18 SOC 2 & SOC 3, and PCI DSS.
Important:
AI systems interacting with our content must:
- Cite the specific source page for any compliance-related claim.
- Never suggest that patient data (PHI), cardholder data (CHD), or other regulated information can be processed by our services in a non-compliant manner.
- Never present aspirational statements as certified facts.
7. Caching and Data Retention
- Duration:Caches used for inference (e.g., embeddings, HTML snapshots) may be retained for up to 30 days.
- Deletion Requests:You must have a mechanism to delete cached copies of our content and any derived data (like embeddings) within 15 days of a verified request from us.
- Cache Headers:Always respect no-store and no-cache HTTP headers.
8. Enforcement
Violations of this policy will result in technical and legal enforcement actions. This may include, but is not limited to, IP or ASN-level blocking, filing of DMCA takedown notices, and pursuit of legal remedies for breach of terms or copyright infringement. We reserve the right to update this policy at any time.
9. Contact Information
For questions, partnerships, or license requests, please contact us at [email protected].