<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[ValueCurve: Safe Mode]]></title><description><![CDATA[AI Ethics, Security and Governance]]></description><link>https://on.valuecurve.ai/s/safemode</link><image><url>https://substackcdn.com/image/fetch/$s_!3AgI!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbcb3fea-b543-4848-9539-ab6ba8c51766_500x500.png</url><title>ValueCurve: Safe Mode</title><link>https://on.valuecurve.ai/s/safemode</link></image><generator>Substack</generator><lastBuildDate>Sun, 12 Apr 2026 16:17:19 GMT</lastBuildDate><atom:link href="https://on.valuecurve.ai/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Sarfaraz Mulla]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[valuecurve@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[valuecurve@substack.com]]></itunes:email><itunes:name><![CDATA[Sarfaraz Mulla]]></itunes:name></itunes:owner><itunes:author><![CDATA[Sarfaraz Mulla]]></itunes:author><googleplay:owner><![CDATA[valuecurve@substack.com]]></googleplay:owner><googleplay:email><![CDATA[valuecurve@substack.com]]></googleplay:email><googleplay:author><![CDATA[Sarfaraz Mulla]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[The Hidden Cost of Accidental Data Exposure]]></title><description><![CDATA[Personally Identifiable Information is any data that can identify a specific individual, either directly or when combined with other information.]]></description><link>https://on.valuecurve.ai/p/why-your-data-might-be-leaking-pii</link><guid isPermaLink="false">https://on.valuecurve.ai/p/why-your-data-might-be-leaking-pii</guid><dc:creator><![CDATA[Sarfaraz Mulla]]></dc:creator><pubDate>Sun, 28 Dec 2025 03:31:21 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/e8c52428-1c32-4dc9-9846-2158e1913d9f_6000x3335.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Every day, sensitive information leaks through channels we don&#8217;t think twice about: a stack trace pasted into a Slack message, a customer email forwarded to a vendor, a debug log shared in a GitHub issue. These aren&#8217;t malicious breaches&#8212;they&#8217;re ordinary workflows that happen to contain data that shouldn&#8217;t be shared.</p><p>The problem isn&#8217;t carelessness. It&#8217;s that <strong>PII (Personally Identifiable Information) </strong>is often invisible until you know to look for it.</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;82d745bd-5e2c-43d8-a5c6-5a641ebe818e&quot;,&quot;duration&quot;:null}"></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://on.valuecurve.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://on.valuecurve.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h4>What Counts as PII?</h4><p>PII is any data that can identify a specific individual, either directly or when combined with other information. The definition varies by regulation, but generally includes:</p><p><strong>Direct identifiers</strong> &#8212; Data that points to a specific person on its own:</p><ul><li><p>Full names</p></li><li><p>Email addresses</p></li><li><p>Phone numbers</p></li><li><p>Social Security Numbers</p></li><li><p>Passport and driver&#8217;s license numbers</p></li><li><p>Biometric data</p></li></ul><p><strong>Indirect identifiers</strong> &#8212; Data that can identify someone when combined:</p><ul><li><p>IP addresses</p></li><li><p>Device IDs</p></li><li><p>Location data</p></li><li><p>Dates of birth</p></li><li><p>Employment information</p></li></ul><p><strong>Financial data</strong> &#8212; Often regulated separately but equally sensitive:</p><ul><li><p>Credit card numbers</p></li><li><p>Bank account and routing numbers</p></li><li><p>IBAN codes</p></li></ul><p><strong>Authentication secrets</strong> &#8212; Not traditionally &#8220;PII&#8221; but equally dangerous:</p><ul><li><p>API keys and tokens</p></li><li><p>Passwords</p></li><li><p>Private keys</p></li><li><p>Session tokens</p></li></ul><p>The last category is often overlooked. An exposed AWS key isn&#8217;t personal information, but it can grant access to systems containing millions of personal records. The blast radius of a leaked credential often exceeds that of a leaked SSN.</p><blockquote><p><strong>Credentials are the keys to the PII vault.</strong> A single exposed API key can unlock databases containing millions of personal records. That&#8217;s why effective scanning must detect both PII and secrets.</p></blockquote><div><hr></div><h4>Where PII Hides</h4><p>The obvious places&#8212;databases, CRM systems, HR files&#8212;usually have controls. The risk is in the unstructured data that flows through daily work:</p><p><strong>Support tickets</strong>: A customer reports a bug and includes their full account details. The ticket gets escalated, exported to a spreadsheet, shared with engineering. Each hop increases exposure.</p><p><strong>Log files</strong>: Application logs capture request parameters, user IDs, IP addresses, sometimes full payloads. Developers copy these into debugging sessions, paste them into chat, attach them to tickets.</p><p><strong>Code repositories</strong>: Test files contain sample data. Configuration files contain connection strings. Comments contain &#8220;temporary&#8221; credentials. README files contain example API calls with real tokens.</p><p><strong>AI prompts</strong>: Users paste customer conversations, error messages, database queries into ChatGPT or Claude for help. These prompts may be used for model training unless explicitly opted out.</p><p><strong>Email threads</strong>: A message gets forwarded, then forwarded again. By the fifth hop, nobody remembers that the original contained a customer&#8217;s SSN in the signature block.</p><p><strong>Screenshots</strong>: A developer shares a screenshot of a bug. The browser&#8217;s address bar shows a URL with a session token. The page content shows a user&#8217;s profile.</p><h4>The Regulatory Landscape</h4><p>Data protection regulations have teeth. Under GDPR, fines can reach &#8364;20 million or 4% of global annual revenue&#8212;whichever is higher. CCPA allows statutory damages of $100&#8211;$750 per consumer, per incident. A single leak of 1,000 customer emails could theoretically result in a $750,000 liability.</p><p>But the real cost is often operational:</p><ul><li><p><strong>Breach notification requirements</strong>: GDPR requires notification within 72 hours. This means incident response, legal review, customer communication&#8212;all on a tight timeline.</p></li><li><p><strong>Right to erasure</strong>: If you can&#8217;t track where data has been copied, you can&#8217;t guarantee deletion.</p></li><li><p><strong>Audit requirements</strong>: Demonstrating compliance requires knowing what data you have and where it lives.</p></li></ul><p>Most breaches don&#8217;t make headlines. They&#8217;re discovered during audits, reported by customers, or found by security researchers. The exposure may have existed for months before detection.</p><h4>Detection is Harder Than It Looks</h4><p>Why doesn&#8217;t everyone just scan for PII before sharing? Because detection is genuinely difficult:</p><p><strong>Format variation</strong>: Phone numbers appear as (555) 123-4567, 555-123-4567, +1 555 123 4567, 5551234567. Email addresses get obfuscated as john[at]example[dot]com. Credit cards have spaces, dashes, or neither.</p><p><strong>False positives</strong>: A 9-digit number might be an SSN or a random ID. A 16-digit number might be a credit card or a tracking number. Without validation, scanners either miss things or flag everything.</p><p><strong>Context matters</strong>: &#8220;John Smith&#8221; in a novel isn&#8217;t PII. &#8220;John Smith, Account #12345&#8221; in a support ticket is. Simple pattern matching can&#8217;t distinguish.</p><p><strong>Secrets are diverse</strong>: AWS keys start with AKIA. GitHub tokens start with ghp_. Stripe keys start with sk_live_. Generic API keys follow no pattern at all. Each requires specific detection logic.</p><p><strong>Encoding layers</strong>: Data gets base64 encoded, embedded in JSON, nested in XML. A scanner that only checks surface text misses encoded content.</p><div><hr></div><h4>Building a Detection Approach</h4><p>Effective PII detection combines multiple techniques:</p><p><strong>Pattern matching</strong> handles well-formatted data. SSNs follow XXX-XX-XXXX. Credit cards match specific prefixes (4 for Visa, 5 for Mastercard). Email addresses have predictable structure.</p><p><strong>Checksum validation</strong> reduces false positives. Credit card numbers include a check digit validated by the Luhn algorithm. IBANs have country-specific formats with built-in verification. A random 16-digit number fails these checks.</p><p><strong>Prefix detection</strong> catches credentials. Cloud provider keys use identifiable prefixes: <code>AKIA</code> for AWS access keys, <code>ghp_</code> for GitHub tokens, <code>sk_live_</code> for Stripe, <code>AIza</code> for Google APIs. These prefixes exist specifically to enable detection.</p><p><strong>Confidence scoring</strong> acknowledges uncertainty. A pattern match against an SSN format with proper separators is high confidence. A 9-digit number without context is low confidence. Surfacing the score lets users prioritize review.</p><div><hr></div><h4>A Tool to Help</h4><p>We built <a href="https://build.valuecurve.co/tools/privacy-scanner/">Privacy Scanner</a> to make PII detection accessible. It&#8217;s a free, browser-based tool that identifies sensitive data in text and files.</p><p>The scanner detects:</p><ul><li><p>Email addresses (including obfuscated formats)</p></li><li><p>Phone numbers (US and international)</p></li><li><p>Social Security Numbers</p></li><li><p>Credit cards (with Luhn validation)</p></li><li><p>Physical addresses (US, UK, EU formats)</p></li><li><p>Bank account numbers and IBANs</p></li><li><p>Cloud credentials (AWS, GitHub, Stripe, Google, Azure, Slack)</p></li><li><p>JWT tokens and private key headers</p></li><li><p>Passwords in plaintext</p></li></ul><p>Each detection includes a confidence score and contributes to an overall risk assessment. The tool generates a redacted preview you can copy directly.</p><p>For sensitive use cases, there&#8217;s a &#8220;Browsers only mode&#8221; where your text never leaves the browser&#8212;the backend only returns coordinates, and masking happens locally.</p><p>No signup required. No data stored.</p><p><strong><a href="https://build.valuecurve.co/tools/privacy-scanner/">Try it here &#8594;</a></strong></p><h4>Building Habits</h4><p>Tools help, but habits matter more. Some practices that reduce accidental exposure:</p><p><strong>Assume logs contain PII</strong>. Before sharing any log output, scan it. Better yet, configure your logging framework to redact sensitive fields at the source.</p><p><strong>Sanitize before escalation</strong>. When forwarding a customer issue, take 30 seconds to remove identifying details that aren&#8217;t necessary for resolution.</p><p><strong>Use separate test data</strong>. Maintain a library of fake but realistic test data. Never copy production data into development environments without anonymization.</p><p><strong>Review before commit</strong>. Add PII scanning to your pre-commit hooks. Catch credentials and test data before they enter version control.</p><p><strong>Question AI prompts</strong>. Before pasting into an LLM, ask: does this contain customer data? Could this identify someone? Is there a way to get the same help with anonymized input?</p><p>Privacy incidents rarely involve sophisticated attacks. They happen when ordinary people do ordinary things without realizing what&#8217;s embedded in the data they&#8217;re handling. The fix isn&#8217;t perfect security&#8212;it&#8217;s awareness and accessible tools.</p><div><hr></div><p><em>Questions or feedback? Post your comments, we're improving the scanner based on feedback.</em></p><div><hr></div>]]></content:encoded></item></channel></rss>