Social Media Trend Detector
An automated Python tool that scrapes unstructured social media data (Twitter/X & Reddit) to identify rising consumer interests before they hit mainstream news.
Processing Speed
5k Posts/min
Accuracy
92%
Tech Stack
Python, Pandas
System Architecture
The script follows an ETL (Extract, Transform, Load) pipeline structure.
Scraper
BeautifulSoup & Selenium
→
Cleaner
Regex & NLTK
→
Analyzer
Pandas Frequency Map
→
Visualizer
Matplotlib & Seaborn
Step 1: The Scraper
We use Selenium to handle dynamic JavaScript content and scroll infinitely to gather historical data.
def fetch_tweets(keyword, count):
driver = webdriver.Chrome()
driver.get(f"https://twitter.com/search?q={keyword}")
data = []
while len(data) < count:
tweets = driver.find_elements(By.CSS_SELECTOR, '[data-testid="tweet"]')
for tweet in tweets:
text = tweet.text
timestamp = tweet.find_element(By.TAG_NAME, 'time').get_attribute('datetime')
data.append({'text': text, 'date': timestamp})
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2)
return pd.DataFrame(data)
Step 2: Data Cleaning
Raw social media data is messy. We remove emojis, URLs, and stop-words using Regex.
import re
from nltk.corpus import stopwords
def clean_text(text):
# Remove URLs
text = re.sub(r'http\S+', '', text)
# Remove Emojis & Special Chars
text = re.sub(r'[^\w\s]', '', text)
# Lowercase & Remove Stopwords
words = [w for w in text.lower().split() if w not in stopwords.words('english')]
return " ".join(words)
Step 3: Trend Visualization
We use Seaborn to create a time-series heatmap of keyword frequency.
Keyword Frequency: "Generative AI" (Last 30 Days)
Final Business Impact
This script was deployed for a niche e-commerce client to track "Sustainable Fashion" trends.
- Early Detection: Identified "Bamboo Fabric" trend 2 weeks before competitors.
- Cost Saving: Replaced $2,000/mo manual research agency.
- Automation: Runs daily at 08:00 AM via Cron Job.