1

Market Research & Platform Selection

Conducted comprehensive research of the Polish job market to identify the most suitable data source. After evaluating multiple job boards, we selected JustJoin.it for its comprehensive tech job listings and structured data format.

2

Web Structure Investigation & Scraper Development

Analyzed the website's structure, API endpoints, and data formats. Built a robust web scraper capable of extracting job offers across multiple technology categories (Java, PHP, Ruby, Python, JavaScript, Data).

3

Data Scraping

Executed the scraper to collect job offers from all target categories. Raw data aggregated into offersCombined.json containing thousands of job postings with details on skills, salaries, locations, and requirements.

4

Core Data Processing Pipeline

Implemented a comprehensive data cleaning and transformation pipeline to standardize and enrich the raw data:

graph TD
    RawData[("Raw Data
(offersCombined.json)")] --> |Load JSON| DeDup{Duplicate URL?} DeDup -- Yes --> Skip[Skip Entry] DeDup -- No --> Extraction subgraph Processing["Processing Pipeline"] Extraction[Extract Data] %% Location Branch Extraction --> LocProc[Location Processing] LocProc --> |"Warsaw → Warszawa"| City[Clean City] %% Salary Branch Extraction --> SalProc[Salary Processing] SalProc --> |"Hourly × 168"| Monthly[Monthly Basis] Monthly --> |"NBP API Rates"| EurConv[Convert to EUR] %% Skill Branch Extraction --> SkillProc[Skill Categorizer] SkillProc --> |"Embeddings & Cosine Sim"| AI[Sentence Transformer] AI --> |"Similarity > 0.65"| Category[Standardized Category] end City --> ObjBuilder[Build Pydantic Object] EurConv --> ObjBuilder Category --> ObjBuilder ObjBuilder --> |Save| Output[("Clean Data
(ClearOffers2.json)")] style RawData fill:#e8d5b7 style Output fill:#e8d5b7 style Processing fill:#f5f0e8
Deduplication: Removed duplicate entries based on unique job URLs
Location Normalization: Standardized city names (e.g., "Warsaw" → "Warszawa")
Salary Conversion: Converted all salaries to EUR using NBP API exchange rates, normalized hourly rates to monthly
Skill Categorization: Used ML embeddings (Sentence Transformer) with cosine similarity to group similar skills into standardized categories
5

Visualization-Specific Data Processing

Generated specialized datasets for each visualization component:

calculateJaccardIndex.js Computes skill co-occurrence patterns using Jaccard similarity index for the skill relationships network
calculateBoxplotData.js Generates salary distribution statistics (quartiles, outliers) grouped by skill and experience level
processExperienceLevel.js Aggregates job offer counts and statistics by experience level (Junior, Mid, Senior, Lead)
processContractType.js Analyzes distribution of contract types (B2B, UoP, etc.) across job offers
processWorkMode.js Categorizes work arrangements (Remote, Hybrid, Office) for market trend analysis
CategoriesCount.py Counts job offers per technology category for treemap visualization
SkillToSalary.py Correlates individual skills with salary ranges for skill value analysis
AverageSalary.py Calculates average salaries segmented by experience level for career trajectory insights
6

Data → Insights

The final transformation brings processed data to life through interactive visualizations. This entire pipeline was built with a clear purpose: to help us, as developers and data enthusiasts, make informed decisions about which skills and technologies to learn next.

By analyzing thousands of job offers, salary ranges, and skill combinations, we can now see clear patterns in the market. Which technologies are in highest demand? What skills command the best salaries? How do different experience levels affect compensation? What's the optimal career progression path?

These insights transform raw market data into actionable knowledge, empowering anyone to strategically plan their learning journey and career development based on real market trends rather than guesswork.

Explore the Dashboard →