PYTHON SCRIPT v2.1

Social Media Trend Detector

An automated Python tool that scrapes unstructured social media data (Twitter/X & Reddit) to identify rising consumer interests before they hit mainstream news.

Processing Speed 5k Posts/min

Accuracy 92%

Tech Stack Python, Pandas

System Architecture

The script follows an ETL (Extract, Transform, Load) pipeline structure.

Scraper BeautifulSoup & Selenium

→

Cleaner Regex & NLTK

→

Analyzer Pandas Frequency Map

→

Visualizer Matplotlib & Seaborn

Step 1: The Scraper

We use Selenium to handle dynamic JavaScript content and scroll infinitely to gather historical data.

scrapper.py

def fetch_tweets(keyword, count):
    driver = webdriver.Chrome()
    driver.get(f"https://twitter.com/search?q={keyword}")
    
    data = []
                    while len(data) < count:
        tweets = driver.find_elements(By.CSS_SELECTOR, '[data-testid="tweet"]')
                    for tweet in tweets:
            text = tweet.text
            timestamp = tweet.find_element(By.TAG_NAME, 'time').get_attribute('datetime')
            data.append({'text': text, 'date': timestamp})
        
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(2)
        
                    return pd.DataFrame(data)

Step 2: Data Cleaning

Raw social media data is messy. We remove emojis, URLs, and stop-words using Regex.

cleaner.py

import re
from nltk.corpus import stopwords

def clean_text(text):
                    # Remove URLs
    text = re.sub(r'http\S+', '', text)
                    # Remove Emojis & Special Chars
    text = re.sub(r'[^\w\s]', '', text)
                    # Lowercase & Remove Stopwords
    words = [w for w in text.lower().split() if w not in stopwords.words('english')]
    
                    return " ".join(words)

Step 3: Trend Visualization

We use Seaborn to create a time-series heatmap of keyword frequency.

Keyword Frequency: "Generative AI" (Last 30 Days)

Day 1

Day 5

Day 10

Day 15

Day 20

Day 25

Day 30

Final Business Impact

This script was deployed for a niche e-commerce client to track "Sustainable Fashion" trends.

Early Detection: Identified "Bamboo Fabric" trend 2 weeks before competitors.
Cost Saving: Replaced $2,000/mo manual research agency.
Automation: Runs daily at 08:00 AM via Cron Job.