Blog [Lingulog] | My Homepage at Sharif | Series Guide
(links to other pages)

Contact Details


(98) 938 566 7518
(98) 912 481 4525
Encrypted Emails:

If you want to send me encrypted emails, download my public key from here


  • Adel Rahimi, "Context based stemming for Persian". An algorithm to determine if stemming is needed for a certain query or not.
  • Adel Rahimi, Mohammad Bahrani "Query expansion based on Farsnet".

  • Published
  • Adel Rahimi, Parvane Khosravizadeh (2017) "How MOOCs are Different from Real Classes? A corpus Study". BRAIN: Broad Research in Artificial Intelligence and Neuroscience, Issue 1 of Volume 8, pp. 36-43, Feb 2018. (PDF)
  • Adel Rahimi, (2017) "Improvement of English to Persian Machine Translation via N-grams of Part-of-Speech tags". Proceedings of 3rd regional conference on new achievements in electrical and computer engineering. (PDF)
  • Adel Rahimi, (2015) "The phonetics and the phonology of final boundary tone in Northern Kurdish". Lingbuzz/002902 (Data) (Abstract)
  • Adel Rahimi, (2015) "A hybrid stemming algorithm for Persian". ArXiv: 1507.03077. (PDF)

    This paper proposes a hybrid stemming algorithm for Persan (Farsi). This hybrid stemming algorithm is based on both Dictionary look-up and affix removal.

  • Panel reviewed
  • Adel Rahimi, (2011) "Rootkits" Khwarizmi National Scientific Fair, CSEE section awarded article. (in Persian), (PDF)
  • This is a paper explaining Rootkits and the types of Rootkits.

  • Adel Rahimi. (2015) "Lexical Bundles in Computational linguistics’ academic literature". arXiv:1603.02905. (Data) (Abstract)
  • This paper is an analysis of the Lexical Bundles in Computational Linguistics' academic literature this study is mostly on EAP (English for Academic Purposes).

    Adel Rahimi, (2015) "Critical Discourse analysis of Adolf Hitler’s Speeches". Unpublished Manuscript, IKIU. DOI: 10.6084/m9.figshare.1491387 (Data).

    This paper is a Critical Discourse Analysis of Hitler's speeches the study is based on a corpus from all his speeches.




    NLTK, Bokeh, Matplotlib, Scipy, Flask, Selenium, Requests, Beautifulsoup, Scrapy, Django, scikit-learn


    NLP, Text mining (TM)


    HTML-CSS (Responsive Designs)


    Elasticsearch, MongoDB (PyMongo) and other NoSQL databases, MySQL and similar SQL databases

    Big data and Data Mining

    Big data tools

    Beginner familiarity with the Hadoop framework, Spark, and MapReduce algorithm

    Data Visualization

    Chart.js, Power BI, tableau, RDF, OWL


    Project Management

    Familiar with Software Quality Assurance and testing, Familiar with CMMI, familiar with SDLC, Agile method, YouTrack (for issue tracking and project management), Jira


    Git (Github, Gitlab, and Bitbucket), Familiar with Machine Learning approaches, Rapidminer, SPSS (Data Mining), Weka, Orange, Apache OpenNLP, Object-oriented designs and MVC Designs, Linux and Mac (Primary), Wordpress, familiar with cloud tools such as Heroku and Amazon S3




    This online website uses AI to turn your texts into presentations.


    Altervocab is an app that takes informal writings into formal writings using N-grams.


    RFC-Tilt Converter

    Simple tools for converting parameters of RFC (Rise Fall Connection) to Tilt intonational models and vice versa. (Github Page) (Released under GPLv3).
    New update is coming soon.

    Adel Rahimi. (2015). RFC-Tilt v1.2. Zenodo. DOI: 10.5281/zenodo.29616

    Kurmanji Stemmer

    The First Stemmer for Kurmanji Kurdish. (Github Page) (Released under GPLv3)
    Rule-based Stemmer written in Python that includes most of the Kurdish suffixes including: 'ek', 'van', 'dar', 'kar', 'xane', 'stan', 'geh', 'én', 'an', 'yan', 'mend', 'em', 'émin', 'in', 'tir'.
    2nd release just came in with more suffixes and typos cleaning.
    If you want to cite any versions of the stemmer you can visit Zenodo home for Kurmanji Stemmer.

    Adel Rahimi. (2015). Kurmanji-Stemmer: The second release. Zenodo. DOI: 10.5281/zenodo.29605

    Persian to Kurmanji transliteration

    Simple Persian to Kurmanji transliteration. (Github page)

    Praat scripts

    Batch pitch detection

    This Praat script gives the details of pitch (maximum, minimum) for labeled tiers in several audio files in bulk. (Github page)

    Pitch Unit Converter

    This Praat script converts the pitch units.
    Units supported: Hertz, Semitone, Bark, Mel. (Github page)
    this script was not originally written by me but I took the liberty to correct it and comment it.


    Kurmanji Speech Corpus

    The First speech corpus for Kurmaji (Kurmanci) Kurdish. (in prep. though partially available)
    sample files available at:
    For more information regarding the corpus contact me.

    Bàlàxàn Corpus of Kurmanji

    balaxan corpus of kurmanji contains 58 utterances of Kurmanji language. (Github page)

  • mp3 format
  • 58 utterances
  • Rahimi, Adel, 2015, "Balaxan corpus of Kurmanji", LINDAT/CLARIN digital library at Institute of Formal and Applied Linguistics, Charles University in Prague, HDL:11372/LRT-1531


    Kurmanji Wikipedia Corpus

    Corpus of Wikipedia in Kurmanji language.

  • txt format
  • ~1 million words
  • ~messy
  • Adel Rahimi, "Kurmaji Wikipedia Corpus", DOI:10.7910/DVN/OHWSUI, Harvard Dataverse Network, V1

    Corpus of World National Anthems

    This is the first corpus of world National Anthems consisting of ~264 national anthems from ~194 countries across the globe.
    For more information read the paper: Adel Rahimi. "Corpus study of World National Anthems". DOI:10.13140/RG.2.1.2953.3924 (2015).

  • txt format
  • Adel Rahimi, "Corpus of World National Anthems", DOI:10.7910/DVN/PZG8TH, Harvard Dataverse Network, V1

    Dakhil wordlist for Persian vocabulary

    The Dakhil Wordlist consists of roughly 260K words from Persian (though there may be some duplicates) ending in "ان" "ات" "ون" "ین" but it is not part of their morphological boundary. the wordlist has been uploaded in both the affixes in one txt file and separated by their ending.
    For more information read the paper: Adel Rahimi, (2015) "A hybrid stemming algorithm for Persian". ArXiv: 1507.03077.

  • txt format
  • Adel Rahimi , 2015, "Dakhil wordlist for Persian vocabulary", DOI:10.7910/DVN/MJBHLN, Harvard Dataverse, V1

    Corpus of Computational Linguistics' Academic Literature

    The corpus of Computational linguistics is an 8 million corpus of Journal publications, books, and theses. these include interdisciplinary topics such as Speech Recognition, Experimental Phonology, Language Models, Machine Learning, Semantics, Syntactic Theory, and Information Retrieval.

  • txt format
  • 8 million words
  • Adel Rahimi , 2015, "Corpus of Computational Linguistics' Academic Literature", DOI:10.7910/DVN/YHHTCI, Harvard Dataverse, V2

    Swadesh List for Kurmanji

    Based on Wikipedia Swadesh list is a classic compilation of basic concepts for the purposes of historical-comparative linguistics. Translations of the Swadesh list into a set of languages allow researchers to quantify the interrelatedness of those languages. The Swadesh list is named after the U.S. linguist Morris Swadesh. It is used in lexicostatistics (the quantitative assessment of the genealogical relatedness of languages) and glottochronology (the dating of language divergence). Because there are several different lists, some authors also refer to "Swadesh lists".
    This is Kurmanji translation of the swadesh list. (link to page)

    Adel Rahimi , 2015, "swadesh: swadesh list for kurmanji", DOI:10.5281/zenodo.35675, Zenodo, V1.1

    Collective of Kurdish Verbs

    This is a collective of sample Kurmanji verbs and their lemma. it can be used in educational purposes and/or historical/sociolinguistics studies. (Docx) (PDF)
    I will update the list soon

  • docx and PDF format
  • 4 verbs
  • Corpus of Adolf Hitler's Speeches

    This is the corpus used in the research "Critical discourse analysis of Adolf hitler's speeches"

  • txt format
  • 276K words
  • Adel Rahimi , 2015, "Replication Data for: Rahimi, A., Critical discourse analysis of Adolf hitler's speeches", DOI:10.7910/DVN/SOANL2, Harvard Dataverse, V1