A data privacy ‘GUT Check’ for synthetic media like ChatGPT

March 29, 2023Bebe Vanek, J.D., Information Privacy Administrator, University of Utah Health Compliance Services

An illustration of a brain made of geometric shapes on an orange background filled with letters, numbers, and thin lines connected by dots.

The rise of synthetic media like OpenAI’s ChatGPT is changing the way many types of content are produced and consumed — from academics to entertainment. As with any innovation, synthetic media raises concerns, such as data privacy and security, ethical issues, and the potential to spread misinformation. Ultimately, it’s up to you to determine whether the risks outweigh the benefits.

Need help?

Concerned about a data security incident? Contact the campus IT Help Desk at 801-581-4000, the hospital ITS Service Desk at 801-587-6000, or the Information Service Office's Security Operations Center at SOC@utah.edu for immediate assistance.

Did you receive a malicious or suspicious email? Use the Phish Alert button in UMail or forward the email as an attachment to phish@utah.edu.

Want to learn more? Reach out to the offices below!

Office of General Counsel: Contact Ogc-admin@lists.utah.edu if you are evaluating a service for your organization and are provided with a contract for goods or services.
Privacy Office: Contact baa@utah.edu if a third-party vendor will be accessing, viewing, storing, or using university protected health information (PHI). If the terms of service or contract suggest data collection, a business associate agreement (BAA) may be legally necessary. Contact privacy@utah.edu with general inquiries about information privacy and your rights and responsibilities.
IT Governance, Risk, & Compliance: Contact ISO-GRC@utah.edu if you are assessing software or an information system for your organization. The U’s Information Security Office must evaluate the security of new software or hardware.
PIVOT: Contact PIVOT Center – Partners for Innovation, Ventures, Outreach & Technology (utah.edu) if you have anidea for innovating systems using apps or software.

Have a privacy topic you’d like to know more about? Contact Bebe Vanek, J.D., information privacy administrator for University of Utah Health Compliance Services, at bebe.vanek@hsc.utah.edu.

Synthetic media is loosely defined as any form of media (visual, textual, audio) generated by or in collaboration with artificial intelligence (AI), such as large language models (LLM). An LLM is “a deep learning algorithm that can recognize, summarize, translate, predict and generate text and other content based on knowledge gained from massive data sets,” according to NVIDIA. Companies, scholars, and organizations, including government agencies, mine data for compilation into massive data sets using the copyright principle of “fair use,” which permits the limited use of copyrighted material without permission.

Currently, the most noteworthy synthetic media platform is ChatGPT, which boasts over 100 million active users. Using a dialogue format, ChatGPT can “answer follow-up questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests,” according to OpenAI. Other platforms include Google’s Bard, Meta’s LLaMA (Large Language Model Meta AI), and Synthesis.

As with every innovation, there are immediate fans, stans, and naysayers. Companies, such as BuzzFeed and ESPN, have voiced support for the technology, indicating that synthetic media will be used to create public-facing content, such as BuzzFeed’s quizzes and ESPN’s commentary. Architects and visual designers also are experimenting with visual synthetic media in their drafting to save time and money.

Some privacy professionals express concern that innovations that use synthetic media will cause more harm than good because it increases avenues for malicious attacks. Criminals have already created websites that impersonate ChatGPT and other OpenAI platforms to phish personal and financial information or prompt users to download files containing malware. Other concerns include the technology’s capability to produce believable fake comments, videos, or other media, leading to a spread of misinformation. Educational institutions are grappling with ChatGPT-written research papers, with ChatGPT appearing as a co-author of at least four published papers and preprints, despite unsettled legal questions of who (or what) owns content generated by large learning models.

Some companies are working to protect user data by building corrective technology to identify plagiarism and “deep fakes” created by synthetic media. Lawyers and legislators are bringing legal and ethical questions forward, encouraging data protection-focused terms of use.

One thing is certain: Synthetic media is here to stay.

That’s why it’s important to increase our technological literacy and implement healthy, curious, and cautious data privacy habits to protect against malicious threat actors, accidental disclosure of sensitive or personal information, and legal liability. To accomplish this, implement a “GUT Check” when using new technology to keep your data protected.

The “GUT Check” for data privacy:

G — Generated: How was the content generated?

U — Understand: Do you understand the terms and conditions of use?

T — Time: Take time before you disclose your data. Are you OK with how it will be used?

Generated: The first step to producing and consuming synthetic media safely is understanding how it was generated. Developers are working to create tools to evaluate whether content was generated with AI, but until these applications are integrated into our smartphones, computers, and other devices, be cautious when engaging with all media. Investigate the origin of content, research the name of the creator(s), and evaluate whether the content seems suspicious. Evaluate whether the content has an unsettling or unnatural appearance, known as the “uncanny valley” effect. Finally, seek to understand the technology by researching how large language models work.

Understand: The second step is understanding the terms and conditions for synthetic media content generators. Let’s face it: Terms and conditions are often long and boring legalese standing between you and the latest content and technology. Ideally, terms and conditions will soon be shortened and made accessible through state or federal privacy regulations, but until then, pay particular attention to the following sections and be sure to understand them before you agree to the terms of use:

Data use and ownership: Who owns the data you input? Who owns the data you produce? How will corporations and organizations use the data you input?

Data retention: How long will the corporation or organization keep your data? Can you review the data that has been collected, amend it, or delete it?

Consent and opt-out: Can you opt-out, or consent (opt-in), to your data being stored, used, or sold by the company?

Time: The final step is to take time to evaluate whether you want to use the technology once you understand how the company or organization plans to store and use your data. If the technology is free to use, it’s likely that the company is benefiting from the user in some way — either by collecting data to sell to other companies or to train the large learning model to improve the technology. If you choose to proceed, pause for a moment to reflect on the information you plan to share before you upload it. Know that everything you do online has a digital footprint and be protective of your personal information, knowing that once you upload it, you can lose control over it.

Node 4

Our monthly newsletter includes news from UIT and other campus/ University of Utah Health IT organizations, features about UIT employees, IT governance news, and various announcements and updates.

Need help?

Node 4

Categories

Featured Posts