GET Learn English with Scala on Future and Actor / Sudo Null IT News FREE
Here I decided to draw ou my European country. In particular, I desirable to significantly expand my lexicon. I know that in that respect are a dish out of programs that help to do this in a playful way. The catch is that I execute not like gamification. I prefer the old fashioned way. A sheet of paper where is a table with words, transcription and translation. And we teach him. And we test our knowledge, e.g., by closing the translation column. In general, as I taught it at the university.
I heard that there are 3000 of the all but normally used speech, selected on the OxfordDictionary site. Hera is this list of run-in: WWW.oxfordlearnersdictionaries.com/wordlist/english/oxford3000/Oxford3000_A-B Well, I distinct to take the translation into Russian from Hera: www.transform.ru/dictionary/en-ruOnly one problem, everything on these sites is not in the format that can be printed and taught. American Samoa a result, the idea was born to program IT all. But to do this not as a sequential algorithm, simply to parallelize everything. That pumping and parsing of all words would non take (3000 words * 2 sites) / 60 seconds = 100 minutes. This is if you give 1 indorse to pump out and parse the pageboy to extract the translation and written text (in reality, I think IT is 3 times longer until the connectedness is opened, patc we close, etc., etc.).
I poor the task at once into two expectant blocks. The first block is blocking I / O operations - pumping a page from a website. The second block is computational operations that do not block, just load the CPU: parsing a page to extract displacement and transcription and adding parsing results to the dictionary.
I decided to do blocking operations in the thread pool using Scala's Future. Machine tasks, I decided to spread Akka into 3 actors. Applying the TDD proficiency, I first wrote a test for my building blocks for a proximo application.
class Examination extends FlatSpec with Matchers { "Table Of Content extractor" should "download and extract content from Oxford Site" in { val content:Inclination[String] = OxfordSite.getTableOfContent content.size should be (10) content.get hold(_ == "A-B") should be (Some("A-B")) content.find(_ == "U-Z") should make up (Some("U-Z")) } "Words list extractor" should "download words from page" in { val tense: Future[Try[Selection[List[String]]]] = OxfordSite.getWordsFromPage("A-B", 1) val wordsTry:Try[Option[List[String]]] = Await.result(future,60 seconds) wordsTry should be a 'success val words = wordsTry.get words.get.find(_ == "abandon") should be (Some("empty")) } "Run-in list extractor" should "return None from lifeless page" in { val future: Next[Try[Option[List[String]]]] = OxfordSite.getWordsFromPage("A-B", 999) val wordsTry:Try[Option[List[String]]] = Wait.result(future,60 seconds) wordsTry should be a 'success val words = wordsTry.incur words should be(None) } "Russian Translation" should "download translation and parse" in { val page: Future[Try[String up]] = LingvoSite.getPage("test") val pageResultTry: Assay[String]= Await.result(page,60 seconds) pageResultTry should be a 'success val pageResult = pageResultTry.get pageResult.contains("тест") should be(legitimate) LingvoSite.parseTranslation(pageResult).get should be("тест") } "English Translation" should "download rendering and parse" in { val Sri Frederick Handley Page: Future[Try[String]] = OxfordSite.getPage("test") val pageResultTry: Strain[String] = Await.result(page,60 seconds) pageResultTry should be a 'success val pageResult = pageResultTry.get pageResult.contains("examination") should comprise(true) OxfordSite.parseTranslation(pageResult).get should be(("test", "an examination of somebody's cognition or power, consisting of questions for them to result or activities for them to execute")) } }
Note. Functions that force out homecoming the result of calculations have Endeavour [...]. Either Success resolution operating theater Unsuccessful person and execution. Functions that will often be called and have i / o blocking operations have a lead look-alike Rising [Try [...]]. Those when vocation the function immediately returns Future in which at that place are long-run i / o trading operations. Moreover, they go inside Try and may end with errors (for example, the connection is rough).
The application itself is initialized in Top3000WordsApp.scala. The system of actors is rebellion. Actors are existence created. Parsing of the discussion heel is started, which in parallel starts the downloading of English and Russian pages with transcription and translation. In the event of a palmy page leap, the capacity of the pages is triggered aside the actors for parsing, extracting translation and transcription. The actors transfer the result of the translation to the final exam actor-dictionary, which akamuliruet all the results in one place. And away pressing enter, the actor system of rules goes to shutdown. And the DictionaryActor actor, receiving a signal about this, saves the assembled lexicon to the dictionaty.txt file
targe Top3000WordsApp extends App { val organisation = ActorSystem("Top3000Words") val dictionatyActor = system.actorOf(Props[DictionaryActor], "dictionatyActor") val englishTranslationActor = system.actorOf(Props(classOf[EnglishTranslationActor], dictionatyActor), "englishTranslationActor") val russianTranslationActor = system.actorOf(Props(classOf[RussianTranslationActor], dictionatyActor), "russianTranslationActor") val mapGetPageThreadExecutionContext = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(16)) val mapGetWordsThreadExecutionContext = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(16)) start() scala.io.StdIn.readLine() arrangement.send away() def starting line() = { meaning simultaneous.ExecutionContext.Implicits.global Emerging { OxfordSite.getTableOfContent.par.foreach(letterGroup => { getWords(letterGroup, 1) }) } } def getWords(letterGroup: String, pageNum: Int): Unit = { implicit val executor = mapGetWordsThreadExecutionContext OxfordSite.getWordsFromPage(letterGroup, pageNum).map(tryWords => { tryWords match { cause Achiever(Some(run-in)) => words.par.foreach(word => { parse(word,letterGroup,pageNum) }) case Success(None) => Unit case Failure(ex) => println(outmoded.getMessage) } }) } def parse(word: String, letterGroup: String, pageNum: Int)= { implicit val executor = mapGetPageThreadExecutionContext OxfordSite.getPage(word).map(tryEnglishPage => { tryEnglishPage match { case Success(englishPage) => { englishTranslationActor ! (Wor, englishPage) getWords(letterGroup, pageNum + 1) } case Unsuccessful person(ex) => println(ex.getMessage) } }) LingvoSite.getPage(word).map out(_ mates { case Success(russianPage) => { russianTranslationActor !(word, russianPage) } case Loser(ex) => println(unstylish.getMessage) }) } }
Note that the algorithmic program is divided into start, getWords, parse functions. This is done because each form of the task requires its possess meander pool, which is implicitly passed as a ThreadExecutionContext. At first, I had only one getWords function, for a recursive call. Just everything worked very lento, since at the top level of the algorithm, parallelization was consumed by the entire syndicate of threads and at the very nates in that location were eternal expectations when they would give Maine a free thread to work. And precisely at the can is the largest number of operations.
Here is the execution of downloading and parsing from sites.
object OxfordSite { val getPageThreadExecutionContext = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(16)) def parseTranslation(content: Train): Try out[(String, Thread)] = { Try { val browser = new Browser val doc = browser.parseString(content) val spanElement: Element = doc >> element(".phon") val str = Jsoup.parse(spanElement.toString).text() val written text = str.stripPrefix("BrE//").stripSuffix("//").crop val translation = Dr. >> text(".def") (transcription,translation) } } def getPage(word: String): Prox[Try[String]] = { implicit val executor = getPageThreadExecutionContext Future { Try { val html = Source.fromURL("http://web.oxfordlearnersdictionaries.com/definition/West Germanic/" + (word.replace(' ','-')) + "_1") html.mkString } } } def getWordsFromPage(letterGroup: String, pageNum: Int): Future[Test[Option[Number[String]]]] = { spell ExecutionContext.Implicits.global Future { Test { val hypertext mark-up language = Source.fromURL("http://www.oxfordlearnersdictionaries.com" + "/wordlist/english/oxford3000/Oxford3000_" + letterGroup + "/?Page=" + pageNum) val page = html.mkString val browser = new Web browser val doc = browser.parseString(Page) val ulElement: Element = doc >> element(".wordlist-oxford3000") val liElements: List[Element] = ulElement >> elementList("li") if (liElements.sizing > 0) Some(liElements.map(_ >> text("a"))) else No } } } def getTableOfContent: List[String] = { val html = Source.fromURL("http://web.oxfordlearnersdictionaries.com/wordlist/english/oxford3000/Oxford3000_A-B/") val pageboy = hypertext mark-up language.mkString val browser = new Web browser val physician = browser.parseString(page) val ulElement: Element = doc >> chemical element(".hide_phone") val liElements: List[Element] = ulElement >> elementList("cardinal") Name(liElements.head >> textual matter("span")) ++ liElements.tail.map(_ >> text("a")) } } object LingvoSite { val getPageThreadExecutionContext = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(16)) def parseTranslation(cognitive content: String): Try[String] = { Try { val browser = new Web browser val physician = browser.parseString(cognitive content) val spanElement: Element = doc >> element(".r_rs") spanElement >> text("a") } } def getPage(word: String): Future[Try[String]] = { implicit val executor = getPageThreadExecutionContext Future { Endeavour { val hypertext mark-up language = Source.fromURL("http://www.render.ru/dictionary/en-ru/" + java.net income.URLEncoder.encode(give-and-take,"UTF-8")) html.mkString } } } }
Data structures that actors work with.
causa classify Word (word: String, arranging: Option[String along] = None, russianTranslation:Pick[Strand] = None, englishTranslation: Option[String] = No) case class RussianTranslation(word:String, translation: String) case class EnglishTranslation(word:String, translation: String) case class Transcription(articulate:String, transcription: String)
Actors who swallow downloaded parsing pages as input and forward translation and recording to DictionaryActor
class EnglishTranslationActor (dictionaryActor: ActorRef) extends Actor { println("EnglishTranslationActor") def receive = { case (word: String, englishPage: Draw) => { OxfordSite.parseTranslation(englishPage) match { case Success((written text, translation)) => { dictionaryActor ! EnglishTranslation(word,translation) dictionaryActor ! Transcription(word,transcription) } suit Failure(ex) => { println(ex.getMessage) } } } } } class RussianTranslationActor (dictionaryActor: ActorRef) extends Actor { println("RussianTranslationActor") def meet = { case (articulate: String, russianPage: String) => { LingvoSite.parseTranslation(russianPage) match { pillow slip Success(translation) => { dictionaryActor ! RussianTranslation(word, translation) } case Loser(ex) => { println(ex.getMessage) } } } } }
An actor who accumulates a dictionary with translations and transcription and after the closing of the system of actors writes the entire dictionary in lexicon.txt
grade DictionaryActor extends Actor { println("DictionaryActor") override def postStop(): Whole = { println("DictionaryActor postStop") val fileText = DictionaryActor.words.correspondenc{case (_, someWord)=> { val arrangement = someWord.arrangement.getOrElse(" ") val russianTranslation = someWord.russianTranslation.getOrElse(" ") val englishTranslation = someWord.englishTranslation.getOrElse(" ") Listing(someWord.word, recording , russianTranslation , englishTranslation).mkString("|") }}.mkString("\n") scala.tools.nsc.io.File("dictionary.txt").writeAll(fileText) println("dictionary.txt saved") System.exit(0) } def receive = { case Transcription(wordName, transcription) => { val newElement = DictionaryActor.words.get(wordName) match { case Some(word) => Logos.copy(transcription = Some(written text)) case None => Word of God(wordName,transcription = Some(transcription)) } DictionaryActor.run-in += wordName -> newElement println(newElement) } case RussianTranslation(wordName, translation) => { val newElement = DictionaryActor.words.get(wordName) lucifer { case Just about(word) => word.copy(russianTranslation = Approximately(translation)) case None => Word(wordName,russianTranslation = Roughly(transformation)) } DictionaryActor.words += wordName -> newElement println(newElement) } case EnglishTranslation(wordName, translation) => { val newElement = DictionaryActor.words.get(wordName) match { case Or s(word) => word.copy(englishTranslation = Some(rendering)) case No => Book(wordName,englishTranslation = Some(translation)) } DictionaryActor.words += wordName -> newElement println(newElement) } } } physical object DictionaryActor { var language = scala.collection.changeable.Map[String, Holy Writ]() }
What are the findings? On my Mac Book Pro, this script ran for about 1 hour while I was writing this article. I fitful information technology by pressing enter and here is the result:
bash-3.2$ cat ./dictionary.txt |wc -l 1809
And then, I ran the script again and left it for respective hours. When atomic number 2 returned, I had a processor loaded 100% and there were errors in the console about the garbage collector, by pressing put down my program could not bring through the result of its work to a file. The diagnosis, writing connected Future and par.map or par.foreach, is better-looking and convenient, of line, but IT's really hard to understand how it really whole kit at the thread level and where is the narrow neck of the bottle. In the end, I contrive to revision everything along the actors. Moreover, I will use the pools of actors. For instance, 4 actors would pump out and parse pages with word lists, 18 actors would pump verboten pages with translations, 4 actors would parse pages with translations and transcriptions, and 1 actor would put everything into a lexicon.
Up-to-date execution in brunch v0.1github.com/evgenyigumnov/top3000words/Sir Herbert Beerbohm Tree/v0.1 The version where everything is copied to actors with pools will be in brunch v0.2, well, in master, a bit later. Maybe someone has thoughts that I did condemnable in the current version? Well, peradventur tips on the new translation?
The project on the github is available: github.com/evgenyigumnov/top3000words
Run the tests of the project: sbt test
Run the application: sbt run
Wellspring, as you get tired of waiting, press get into and view the table of contents of dictionary.txt in the present-day folder
PS
As a result, I ready-made the unalterable version of v0. 2, which parses in 10 minutes in 30 togs. github.com/evgenyigumnov/top3000words/tree/v0.2
At the end of enter, you make out not need to press. Everything is done on the actors. In Future, only heavy i / o blockers are wrapped.
DOWNLOAD HERE
GET Learn English with Scala on Future and Actor / Sudo Null IT News FREE
Posted by: hendersonpentrong1942.blogspot.com
0 Response to "GET Learn English with Scala on Future and Actor / Sudo Null IT News FREE"
Post a Comment